Skip to main content

Full text of "Claude Shannon's Miscellaneous Writings"

See other formats

Claude Elwood Shannon 
Miscellaneous Writings 

Edited by 

N. J. A. Sloane 
Aaron D. Wyner 

Back in 1993, the late Aaron Wyner and I edited Claude Elwood Shannon's 
papers, and most of them appeared in a volume (Claude Elwood 
Shannon's Collected Papers) which was published by the IEEE Press. 

However, there were a number of items written by Shannon of lesser 
interest which we did not include (some declassified wartime memoranda, 
obscure AT&T Bell Labs memos, some mimeographed MIT lecture notes, etc.). 

These we put into a binder, held together by an Acco metal strip. 

We made half a dozen copies, and gave copies to the Library 

of Congress, the British Library, the Bell Laboratories Library, 

the MIT Library, to Claude Shannon himself, and to one or two other places. 

Over the years many people have asked me if it was possible to get access 
to this collection. 

I had now had this volume scanned and converted to pdf files. 
The total size of the files is about 450 megabytes. 

Neil J. A. Sloane, October 13, 2013 

Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill, 
New Jersey 07974 


File 1 : Front matter 

This volume contains the following items. Bracketed numbers refer to the bibliography. 

"The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum 
MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs. 

"A Study of the Deflection Mechanism and Some Results on Rate Finders," 
Report to National Defense Research Committee, Div. 7-311 -Ml, circa April, 
1941,37 pp. + 15 figs. 

"A Height Data Smoothing Mechanism," Report to National Defense Research 
Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs. 

"Some Experimental Results on the Deflection Mechanism," Report to National 
Defense Research Committee, Div. 7-31 1-M1, June 26, 1941, 11 pp. 

"Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8, 
1941,5 pp. + 3 figs. 

(With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen 
Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense 
Research Committee, July 15, 1943, 9 pp. 

"Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944, 
Bell Laboratories, 2 pp. + 3 Figs. 

(Note that many of these files contain more than one document.) 






















































[20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell 
Laboratories, 1 p. + 1 fig. 

[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver," 
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs. 

[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript, 
August 4, 1944, Bell Laboratories, 4 pp. 

[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept. 
1, 1945, Bell Laboratories, 1 14 pp. + 25 figs. 

[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell 
Laboratories, 17 pp. 

[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in 
Fire-Control Systems," Summary Technical Report, Div. 7, National Defense 
Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159 
and 166-167. AD 200795. Also in National Military Establishment Research and 
Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by 
[51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory 
and Practice, Addison-Wesley, Reading, Mass., 1965. 

[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four- 
Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM 
46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs. 

[31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946, 
Bell Laboratories, 5 pp. + 1 fig. 

[32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5 
pp. + 1 fig. 

[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5 

[35] "Systems Which Approach the Ideal as P/N — > «>," Typescript, March 15, 
1948, 2 pp. 

[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp. 

[45] "Significance and Application [of Communication Research]," Symposium on 
Communication Research, 11-13 October, 1948, Research and Development 
Board, Department of Defense, Washington, DC, pp. 14-23, 1948. 

[46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell 
Laboratories, 1 p. 

[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18, 
1948, Bell Laboratories, 2 pp. 

[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell 
Laboratories, 2 pp. + 2 Figs. 


































File 104 

[49] "Information Theory," Typescript of abstract of talk for American Statistical 
Society, 1949, 5 pp. 

[58] "Proof of an Integration Formula,'* Typescript, circa 1950, Bell Laboratories, 2 

[59] "A Digital Method of Transmitting Information," Typescript, no date, circa 
1950, Bell Laboratories, 3 pp. 

[72] * 'Creative Thinking,' ' Typescript, March 20, 1952, Bell Laboratories, 10 pp. 

[74] (With E. F. Moore) "The Relay Circuit Analyzer,*' Memorandum MM 53-1400- 
9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. 

[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7 

[78] ' 'Tower of Hanoi,' ' Typescript, April 20, 1953, Bell Laboratories, 4 pp. 

[81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving 
the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp. 

[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53- 
140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs. 

[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited 
Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig. 

[95] "Concavity of Transmission Rate as a Function of Input Probabilities," 
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. 

[104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology, 
1956 and succeeding years. Contains the following sections: 

"A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of 
martingales and related questions," 19 pp. "Some useful inequalities for 
distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9 
pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp. 
"Upper and lower bounds for powers of a matrix with non-negative elements," 3 
pp. "The number of sequences of a given length," 3 pp. "Characteristic for a 
language with independent letters/' 4 pp. "The probability of error in optimal 
codes," 5 pp. "Zero error codes and the zero error capacity Co," 10 pp. 
"Lower bound for P e f for a completely connected channel with feedback," 1 p. 
"A lower bound for P € when R > C," 2 pp. "A lower bound for P e ," 2 pp. 
"Lower bound with one type of input and many types of output," 3 pp. 
"Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for 
the memory less feedback channel," 1 p. "Continuity of P e opt as a function of 
transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of 
P e to p," 2 pp. "Bound on P e for random ode by simple threshold argument," 4 
pp. "A bound on P € for a random code," 3 pp. "The Feinstein bound," 2 pp. 
"Relations between probability and minimum word separation," 4 pp. 

File 104 









■ [107] 









; [127] 

"Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a 
function of transition probabilities," 1 pp. "A geometric interpretation of 
channel capacity," 6 pp, "Log moment generating function for the square of a 
Gaussian variate," 2 pp. "Upper bound on P e for Gaussian channel by 
expurgated random code," 2 pp. "Lower bound on P e in Gaussian channel by 
minimum distance argument," 2 pp, "The sphere packing bound for the 
Gaussian power limited channel," 4 pp. "The r-terminal channel," 7 pp. 
"Conditions for constant mutual information," 2 pp, "The central limit theorem 
with large deviations," 6 pp. "The Chemoff inequality," 2 pp. "Upper and 
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the 
distribution function," 5 pp. "Generalized Chebyeheff and Chernoff 
inequalities," I p. "Channels with side information at the transmitter," 13 pp. 
"Some miscellaneous results in coding theory," 15 pp. "Error probability 
bounds for noisy channels," 20 pp. 

"Reliable Machines from Unreliable Components," notes of five lectures, 
Massachusetts Institute of Technology, Spring 1956, 24 pp. 

"The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by 
W. W, Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp. 

"Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a 
lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp. 

"Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture, 
Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp. 

"The Fourth- Dimensional Twist, or a Modest Proposal in Aid of the American 
Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7 
pp. + 8 figs. 

"A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. 

Claude Elwood Shannon 
Miscellaneous Writings 

Edited by 

N. J. A. Sloane 
Aaron D. Wyner 

Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill, 
New Jersey 07974 


This volume contains all of Claude Elwood Shannon's writings that we did not include in 
his Collected Papers. * 

* Claude Elwood Shannon: Collected Papers, edited by N. J. A. Sloane and A. D. Wyner, IEEE Press, 
New York, 1993, xliv + 924 pp. ISBN 0-7803-0434-9. 


Photograph of Claude Shannon at Bell Labs in May 1952. Caption: "In 1952, Claude E. 
Shannon of Bell Laboratories devised an experiment to illustrate the capabilities of 
telephone relays. Here, an electrical mouse finds its way unerringly through a maze, 
guided by information remembered in the kind of switching relays used in dial telephone 
systems. Experiments with the mouse helped stimulate Bell Laboratories researchers to 
think of new ways to use the logical powers of computers for operations other than 
numerical calculation." 

Photograph of Claude Shannon and Dave Hagelbarger at Bell Labs in March 1955. 
Caption: "Claude Shannon, the originator of Information Theory, at the board and Dave 
Hagelbarger work out some equations needed. Their current projects include work on 
automata-advanced type of computing machines which are able to perform various 
thought functions. 

Photograph of Claude Shannon taken in 1980's. Photographer unknown. 

Bibliography of Claude Elwood Shannon. Comments such as "Included in Part B" refer 
to Parts A, B, C, D of the Collected Papers mentioned in the Preface. 

This volume contains the following items. Bracketed numbers refer to the bibliography. 

[5] 4 The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum 
MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs. 

[7] "A Study of the Deflection Mechanism and Some Results on Rate Finders," 
Report to National Defense Research Committee, Div. 7-31 1-M1, circa April, 
1941,37 pp. + 15 figs. 

[9] "A Height Data Smoothing Mechanism," Report to National Defense Research 
Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs. 

[11] "Some Experimental Results on the Deflection Mechanism," Report to National 
Defense Research Committee, Div. 7-31 1 -Ml, June 26, 1941, 1 1 pp. 

[12] "Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8, 
1941,5 pp. + 3 figs. 

[16] (With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen 
Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense 
Research Committee, July 15, 1943, 9 pp. 

[19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944, 
Bell Laboratories, 2 pp. + 3 Figs. 


[20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell 
Laboratories, 1 p. + 1 fig. 

[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver," 
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs. 

[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript, 
August 4, 1944, Bell Laboratories, 4 pp. 

[24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept. 
1, 1945, Bell Laboratories, 1 14 pp. + 25 figs. 

[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell 
Laboratories, 17 pp. 

[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in 
Fire-Control Systems," Summary Technical Report, Div. 7, National Defense 
Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159 
and 166-167. AD 200795. Also in National Military Establishment Research and 
Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by 
[51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory 
and Practice, Addison- Wesley, Reading, Mass., 1965. 

[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four- 
Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM 
46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs. 

[31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946, 
Bell Laboratories, 5 pp. + 1 fig. 

[32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5 
pp. + 1 fig. 

[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5 

[35] "Systems Which Approach the Ideal as P/N -> <»," Typescript, March 15, 
1948, 2 pp. 

[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp. 

[45] "Significance and Application [of Communication Research]," Symposium on 
Communication Research, 11-13 October, 1948, Research and Development 
Board, Department of Defense, Washington, DC, pp. 14-23, 1948. 

[46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell 
Laboratories, 1 p. 

[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18, 
1948, Bell Laboratories, 2 pp. 

[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell 
Laboratories, 2 pp. + 2 Figs. 


[49] "Information Theory," Typescript of abstract of talk for American Statistical 
Society, 1949, 5 pp. 

[58] "Proof of an Integration Formula," Typescript, circa 1950, Bell Laboratories, 2 

[59] "A Digital Method of Transmitting Information," Typescript, no date, circa 
1950, Bell Laboratories, 3 pp. 

[72] ' 'Creative Thinking," Typescript, March 20, 1952, Bell Laboratories, 10 pp. 

[74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM 53-1400- 
9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. 

[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7 

[78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp. 

[81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving 
the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp. 

[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53- 
140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs. 

[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited 
Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig. 

[95] "Concavity of Transmission Rate as a Function of Input Probabilities," 
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. 

[104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology, 
1956 and succeeding years. Contains the following sections: 

"A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of 
martingales and related questions," 19 pp. "Some useful inequalities for 
distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9 
pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp. 
"Upper and lower bounds for powers of a matrix with non-negative elements," 3 
pp. "The number of sequences of a given length," 3 pp. "Characteristic for a 
language with independent letters," 4 pp. "The probability of error in optimal 
codes," 5 pp. "Zero error codes and the zero error capacity C ," 10 pp. 
"Lower bound for P e j for a completely connected channel with feedback," 1 p. 
"A lower bound for P e when R > C," 2 pp. "A lower bound for P e " 2 pp. 
"Lower bound with one type of input and many types of output," 3 pp. 
"Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for 
the memoryless feedback channel," 1 p. "Continuity of P e opt as a function of 
transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of 
P e to p," 2 pp. "Bound on P e for random ode by simple threshold argument," 4 
pp. "A bound on P e for a random code," 3 pp. "The Feinstein bound," 2 pp. 
"Relations between probability and minimum word separation," 4 pp. 


"Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a 
function of transition probabilities," 1 pp. "A geometric interpretation of 
channel capacity," 6 pp. "Log moment generating function for the square of a 
Gaussian variate," 2 pp. "Upper bound on P e for Gaussian channel by 
expurgated random code," 2 pp. "Lower bound on P e in Gaussian channel by 
minimum distance argument," 2 pp. "The sphere packing bound for the 
Gaussian power limited channel," 4 pp. "The jT-terminal channel," 7 pp. 
"Conditions for constant mutual information," 2 pp. "The central limit theorem 
with large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and 
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the 
distribution function," 5 pp. "Generalized Chebycheff and Chernoff 
inequalities," 1 p. "Channels with side information at the transmitter," 13 pp. 
"Some miscellaneous results in coding theory," 15 pp. "Error probability 
bounds for noisy channels," 20 pp. 

[105] "Reliable Machines from Unreliable Components," notes of five lectures, 
Massachusetts Institute of Technology, Spring 1956, 24 pp. 

[106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by 
W. W. Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp. 

[107] "Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a 
lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp. 

[108] "Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture, 
Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp. 

[124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the American 
Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7 
pp. + 8 figs. 

[127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. 

Bibliography of Claude Elwood Shannon 

"A Symbolic Analysis of Relay and Switching Circuits," Transactions 
American Institute of Electrical Engineers, Vol. 57 (1938), pp. 713-723. 
(Received March 1, 1938.) Included in Part B. 

Letter to Vannevar Bush, Feb. 16, 1939. Printed in F.-W. Hagemeyer, 
Die Entstehung von Informationskonzepten in der Nachrichtentechnik: 
eine Fallstudie zur Theoriebildung in der Technik in Industrie- und 
Kriegsforschung [The Origin of Information Theory Concepts in 
Communication Technology: Case Study for Engineering Theory- 
Building in Industrial and Military Research], Doctoral Dissertation, 
Free Univ. Berlin, Nov. 8, 1979, 570 pp. Included in Part A. 

"An Algebra for Theoretical Genetics," Ph.D. Dissertation, Department 
of Mathematics, Massachusetts Institute of Technology, April 15, 1940, 
69 pp. Included in Part C. 

"A Theorem on Color Coding," Memorandum 40-130-153, July 8, 
1940, Bell Laboratories. Superseded by "A Theorem on Coloring the 
Lines of a Network. ' ' Not included. 

"The Use of the Lakatos-Hickman Relay in a Subscriber Sender," 
Memorandum MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. 

"A Study of the Deflection Mechanism and Some Results on Rate 
Finders," Report to National Defense Research Committee, Div. 7-311- 
Ml, circa April, 1941, 37 pp. + 15 figs. Included in this volume. 

"Backlash in Overdamped Systems," Report to National Defense 
Research Committee, Princeton Univ., May 14, 1941, 6 pp. Abstract 
only included in Part B. 

"A Height Data Smoothing Mechanism," Report to National Defense 
Research Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 
9 pp. + 9 figs. Included in this volume. 

"The Theory of Linear Differential and Smoothing Operators," Report 
to National Defense Research Committee, Div. 7-3 13.1 -Ml, Princeton 
Univ., June 8, 1941, 1 1 pp. Not included. 

"Some Experimental Results on the Deflection Mechanism," Report to 
National Defense Research Committee, Div. 7-3 11 -Ml, June 26, 1941, 
1 1 pp. Included in this volume. 


[12] "Criteria for Consistency and Uniqueness in Relay Circuits," 
Typescript, Sept. 8, 1941, 5 pp. + 3 figs. Included in this volume. 

[13] "The Theory and Design of Linear Differential Equation Machines," 
Report to the Services 20, Div. 7-31 1-M2, Jan. 1942, Bell Laboratories, 
73 pp. + 30 figs. Included in Part B. 

[14] (With John Riordan) "The Number of Two-Terminal Series-Parallel 
Networks," Journal of Mathematics and Physics, Vol. 21 (August, 
1942), pp. 83-93. Included in Part B. 

[15] "Analogue of the Vernam System for Continuous Time Series," 
Memorandum MM 43-110-44, May 10, 1943, Bell Laboratories, 4 pp. + 
4 figs. Included in Part A. 

[16] (With W. Feller) "On the Integration of the Ballistic Equations on the 
Aberdeen Analyzer," Applied Mathematics Panel Report No. 28.1, 
National Defense Research Committee, July 15, 1943, 9 pp. Included in 
this volume. 

[17] "Pulse Code Modulation," Memorandum MM 43-110-43, December 1, 

1943, Bell Laboratories. Not included. 

[18] "Feedback Systems with Periodic Loop Closure," Memorandum MM 
44-1 10-32, March 16, 1944, Bell Laboratories. Not included. 

[19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 

1944, Bell Laboratories, 2 pp. + 3 Figs. Included in this volume. 

[20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 
1944, Bell Laboratories, 1 p. + 1 fig. Included in this volume. 

[21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver," 
Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 
figs. Included in this volume. 

[22] "The Best Detection of Pulses," Memorandum MM 44-1 10-28, June 22, 
1944, Bell Laboratories, 3 pp. Included in Part A. 

[23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," 
Typescript, August 4, 1944, Bell Laboratories, 4 pp. Included in this 

[24] "A Mathematical Theory of Cryptography," Memorandum MM 45- 
110-02, Sept. 1, 1945, Bell Laboratories, 114 pp. + 25 figs. Superseded 
by the following paper. Included in this volume. 

[25] "Communication Theory of Secrecy Systems," Bell System Technical 
Journal, Vol. 28 (1949), pp. 656-715. "The material in this paper 
appeared originally in a confidential report 'A Mathematical Theory of 
Cryptography', dated Sept. 1, 1945, which has now been declassified." 
Included in Part A. 


[26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, 
Bell Laboratories, 17 pp. Included in this volume. 

[27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and 
Prediction in Fire-Control Systems," Summary Technical Report, 
Div. 7, National Defense Research Committee, Vol. 1 , Gunfire Control, 
Washington, DC, 1946, pp. 71-159 and 166-167. AD 200795. Also in 
National Military Establishment Research and Development Board, 
Report #13 MGC 12/1, August 15, 1948. Superseded by [51] and by R. 
B. Blackman, Linear Data-Smoothing and Prediction in Theory and 
Practice, Addison-Wesley, Reading, Mass., 1965. Included in this 

[28] (With B. M. Oliver) "Communication System Employing Pulse Code 
Modulation," Patent 2,801,281. Filed Feb. 21, 1946, granted July 30, 
1957. Not included. 

[29] (With B. D. Holbrook) "A Sender Circuit For Panel or Crossbar 
Telephone Systems," Patent application circa 1946, application dropped 
April 13, 1948. Not included. 

[30] (With C. L. Dolph) "The Transient Behavior of a Large Number of 
Four-Terminal Unilateral Linear Networks Connected in Tandem," 
Memorandum MM 46-110-49, April 10, 1946, Bell Laboratories, 34 pp. 
+ 16 figs. Included in this volume. 

[31] "Electronic Methods in Telephone Switching," Typescript, October 17, 
1946, Bell Laboratories, 5 pp. + 1 fig. Included in this volume. 

[32] "Some Generalizations of the Sampling Theorem," Typescript, March 
4, 1948, 5 pp. + 1 fig. Included in this volume. 

[33] (With J. R. Pierce and J. W. Tukey) "Cathode-Ray Device," Patent 
2,576,040. Filed March 10, 1948, granted Nov. 20, 1951. Not included. 

[34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 
1948, 5 pp. Included in this volume. 

[35] "Systems Which Approach the Ideal as P/N -> oo," Typescript, March 
15, 1948, 2 pp. Included in this volume. 

[36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp. 
Included in this volume. 

[37] "A Mathematical Theory of Communication," Bell System Technical 
Journal, Vol. 27 (July and October 1948), pp. 379-423 and 623-656. 
Reprinted in D. Slepian, editor, Key Papers in the Development of 
Information Theory, IEEE Press, NY, 1974. Included in Part A. 

[38] (With Warren Weaver) The Mathematical Theory of Communication, 
University of Illinois Press, Urbana, JL, 1949, vi + 1 17 pp. Reprinted 
(and repaginated) 1963. The section by Shannon is essentially identical 

to the previous item. Not included. 

[39] (With Warren Weaver) Mathematische Grundlagen der 
Informationstheorie, Scientia Nova, Oldenbourg Verlag, Munich, 1976, 
pp. 143. German translation of the preceding book. Not included. 

[40] (With B. M. Oliver and J. R. Pierce) "The Philosophy of PCM," 
Proceedings Institute of Radio Engineers, Vol. 36 (1948), pp. 1324- 
1331. (Received May 24, 1948.) Included in Part A. 

[41] "Samples of Statistical English," Typescript, June 11, 1948, Bell 
Laboratories, 3 pp. Included in this volume. 

[42] "Network Rings," Typescript, June 11, 1948, Bell Laboratories, 26 pp. 
+ 4 figs. Included in Part B. 

[43] "Communication in the Presence of Noise," Proceedings Institute of 
Radio Engineers, Vol. 37 (1949), pp. 10-21. (Received July 23, 1940 
[1948?].) Reprinted in D. Slepian, editor, Key Papers in the 
Development of Information Theory, IEEE Press, NY, 1974. Reprinted 
in Proceedings Institute of Electrical and Electronic Engineers, Vol. 72 
(1984), pp. 1192-1201. Included in Part A. 

[44] "A Theorem on Coloring the Lines of a Network," Journal of 
Mathematics and Physics, Vol. 28 (1949), pp. 148-151. (Received Sept. 
14, 1948.) Included in Part B. 

[45] "Significance and Application [of Communication Research]," 
Symposium on Communication Research, 11-13 October, 1948, Research 
and Development Board, Department of Defense, Washington, DC, pp. 
14-23, 1948. Included in this volume. 

[46] "Note on Certain Transcendental Numbers," Typescript, October 27, 
1948, Bell Laboratories, 1 p. Included in this volume. 

[47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, 
Nov. 18, 1948, Bell Laboratories, 2 pp. Included in this volume. 

[48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 
1948, Bell Laboratories, 2 pp. + 2 Figs. Included in this volume. 

[49] "Information Theory," Typescript of abstract of talk for American 
Statistical Society, 1949, 5 pp. Included in this volume. 

[50] "The Synthesis of Two-Terminal Switching Circuits," Bell System 
Technical Journal, Vol. 28 (Jan., 1949), pp. 59-98. Included in Part B. 

[51] (With H. W. Bode) "A Simplified Derivation of Linear Least Squares 
Smoothing and Prediction Theory," Proceedings Institute of Radio 
Engineers, Vol. 38 (1950), pp. 417-425. (Received July 13, 1949.) 
Included in Part B. 


[52] "Review of Transformations on Lattices and Structures of Logic by 
Stephen A. Kiss," Proceedings Institute of Radio Engineers, Vol. 37 
(1949), p. 1 163. Included in Part B. 

[53] "Review of Cybernetics, or Control and Communication in the Animal 
and the Machine by Norbert Wiener," Proceedings Institute of Radio 
Engineers, Vol. 37 (1949), p. 1305. Included in Part B. 

[54] "Programming a Computer for Playing Chess," Philosophical 
Magazine, Series 7, Vol. 41 (No. 314, March 1950), pp. 256-275. 
(Received Nov. 8, 1949.) Reprinted in D. N. L. Levy, editor, Computer 
Chess Compendium, Springer- Verlag, NY, 1988. Included in Part B. 

[55] "A Chess-Playing Machine," Scientific American, Vol. 182 (No. 2, 
February 1950), pp. 48-51. Reprinted in The World of Mathematics, 
edited by James R. Newman, Simon and Schuster, NY, Vol. 4, 1956, pp. 
2124-2133. Included in Part B. 

[56] "Memory Requirements in a Telephone Exchange," Bell System 
Technical Journal, Vol. 29 (1950), pp. 343-349. (Received Dec. 7, 

1949. ) Included in Part B. 

[57] "A Symmetrical Notation for Numbers," American Mathematical 
Monthly, Vol. 57 (Feb., 1950), pp. 90-93. Included in Part B. 

[58] "Proof of an Integration Formula," Typescript, circa 1950, Bell 
Laboratories, 2 pp. Included in this volume. 

[59] "A Digital Method of Transmitting Information," Typescript, no date, 
circa 1950, Bell Laboratories, 3 pp. Included in this volume. 

[60] "Communication Theory — Exposition of Fundamentals," in "Report 
of Proceedings, Symposium on Information Theory, London, Sept., 

1950, " Institute of Radio Engineers, Transactions on Information 
Theory, No. 1 (February, 1953), pp. 44-47. Included in Part A. 

[61] "General Treatment of the Problem of Coding," in "Report of 
Proceedings, Symposium on Information Theory, London, Sept., 1950," 
Institute of Radio Engineers, Transactions on Information Theory, No. 1 
(February, 1953), pp. 102-104. Included in Part A. 

[62] "The Lattice Theory of Information," in "Report of Proceedings, 
Symposium on Information Theory, London, Sept., 1950," Institute of 
Radio Engineers, Transactions on Information Theory, No. 1 (February, 
1953), pp. 105-107. Included in Part A. 

[63] (With E. C. Cherry, S. H. Moss, Dr. Uttley, I. J. Good, W. Lawrence and 
W. P. Anderson) "Discussion of Preceding Three Papers," in "Report 
of Proceedings, Symposium on Information Theory, London, Sept., 
1950," Institute of Radio Engineers, Transactions on Information 
Theory, No. 1 (February, 1953), pp. 169-174. Included in Part A. 

[64] "Review of Description of a Relay Computer, by the Staff of the 
[Harvard] Computation Laboratory," Proceedings Institute of Radio 
Engineers, Vol. 38 (1950), p. 449. Included in Part B. 

[65] "Recent Developments in Communication Theory," Electronics, Vol. 
23 (April, 1950), pp. 80-83. Included in Part A. 

[66] German translation of [65], in Tech. Mitt. P.T.T., Bern, Vol. 28 (1950), 
pp. 337-342. Not included. 

[67] "A Method of Power or Signal Transmission To a Moving Vehicle," 
Memorandum for Record, July 19, 1950, Bell Laboratories, 2 pp. + 4 
figs. Included in Part B. 

[68] "Some Topics in Information Theory," in Proceedings International 
Congress of Mathematicians (Cambridge, Mass., Aug. 30 - Sept. 6, 1950) 
, American Mathematical Society, Vol. II (1952), pp. 262-263. Included 
in Part A. 

[69] "Prediction and Entropy of Printed English," Bell System Technical 
Journal, Vol. 30 (1951), pp. 50-64. (Received Sept. 15, 1950.) 
Reprinted in D. Slepian, editor, Key Papers in the Development of 
Information Theory, IEEE Press, NY, 1974. Included in Part A. 

[70] "Presentation of a Maze Solving Machine," in Cybernetics: Circular, 
Causal and Feedback Mechanisms in Biological and Social Systems, 
Transactions Eighth Conference, March 15-16, 1951, New York, N. K, 
edited by H. von Foerster, M. Mead and H. L. Teuber, Josiah Macy Jr. 
Foundation, New York, 1952, pp. 169-181. Included in Part B. 

[71] "Control Apparatus," Patent application Aug. 1951, dropped Jan. 21, 
1954. Not included. 

pp. Included in this volume. 

[73] "A Mind-Reading (?) Machine," Typescript, March 18, 1953, Bell 
Laboratories, 4 pp. Included in Part B. 

[74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM 
53-1400-9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. Included 
in this volume. 

[75] "The Potentialities of Computers," Typescript, April 3, 1953, Bell 
Laboratories. Included in Part B. 

[76] "Throbac I," Typescript, April 9, 1953, Bell Laboratories, 5 pp. 
Included in Part B. 

[72] "Creative Thinking," 

20, 1952, Bell Laboratories, 10 

[77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell 
Laboratories, 7 pp. Included in this volume. 


[78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp. 
Included in this volume. 

[79] (With E. F. Moore) "Electrical Circuit Analyzer," Patent 2,776,405. 
Filed May 18, 1953, granted Jan. 1, 1957. Not included. 

[80] (With E. F. Moore) "Machine Aid for Switching Circuit Design," 
Proceedings Institute of Radio Engineers, Vol. 41 (1953), pp. 1348- 
1351. (Received May 28, 1953.) Included in Part B. 

[81] "Mathmanship or How to Give an Explicit Solution Without Actually 
Solving the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp. 
Included in this volume. 

[82] "Computers and Automata," Proceedings Institute of Radio Engineers, 
Vol.41 (1953), pp. 1234-1241. (Received July 17, 1953.) Reprinted in 
Methodos, Vol. 6 (1954), pp. 1 15-130. Included in Part B. 

[83] "Realization of All 16 Switching Functions of Two Variables Requires 
18 Contacts," Memorandum MM 53-1400-40, November 17, 1953, Bell 
Laboratories, 4 pp. + 2 figs. Included in Part B. 

[84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum 
MM 53-140-52, November 30, 1953, Bell Laboratories, 26 pp. + 5 figs. 
Included in this volume. 

[85] (With D. W. Hagelbarger) "A Relay Laboratory Outfit for Colleges," 
Memorandum MM 54-114-17, January 10, 1954, Bell Laboratories. 
Included in Part B. 

[86] "Efficient Coding of a Binary Source With One Very Infrequent 
Symbol," Memorandum MM 54-114-7, January 29, 1954, Bell 
Laboratories. Included in Part A. 

[87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude 
Limited Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 
Fig. Included in this volume. 

[88] (With Edward F. Moore) "Reliable Circuits Using Crummy Relays," 
Memorandum 54-114-42, Nov. 29, 1954, Bell Laboratories. Published 
as the following two items. 

[89] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays 
I," Journal Franklin Institute, Vol. 262 (Sept., 1956), pp. 191-208. 
Included in Part B. 

[90] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays 
n," Journal Franklin Institute, Vol. 262 (Oct., 1956), pp. 281-297. 
Included in Part B. 

[91] (Edited jointly with John McCarthy) Automata Studies, Annals of 
Mathematics Studies Number 34, Princeton University Press, Princeton, 


NJ, 1956, ix + 285 pp. The Preface, Table of Contents, and the two 
papers by Shannon are included in Part B. 

[92] (With John McCarthy), Studien zur Theorie der Automaten, Munich, 
1974. (German translation of the preceding work.) 

[93] ' 'A Universal Turing Machine With Two Internal States," Memorandum 
54-114-38, May 15, 1954, Bell Laboratories. Published in Automata 
Studies, pp. 157-165. Included in Part B. 

[94] (With Karel de Leeuw, Edward F. Moore and N. Shapiro) 
"Computability by Probabilistic Machines," Memorandum 54-114-37, 
Oct. 21, 1954, Bell Laboratories. Published in [87], pp. 183-212. 
Included in Part B. 

[95] "Concavity of Transmission Rate as a Function of Input Probabilities," 
Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. Included 
in this volume. 

[96] "Some Results on Ideal Rectifier Circuits," Memorandum MM 55-1 14- 
29, June 8, 1955, Bell Laboratories. Included in Part B. 

[97] "The Simultaneous Synthesis of s Switching Functions of n Variables," 
Memorandum MM 55-1 14-30, June 8, 1955, Bell Laboratories. Included 
in Part B. 

[98] (With D. W. Hagelbarger) "Concavity of Resistance Functions," 
Journal Applied Physics, Vol. 27 (1956), pp. 42-43. (Received August 1, 
1955.) Included in Part B. 

[99] ' 'Game Playing Machines," Journal Franklin Institute, Vol. 260 ( 1 955), 
pp. 447-453. (Delivered Oct. 19, 1955.) Included in Part B. 

[100] "Information Theory," Encyclopedia Britannica, Chicago, IL, 14th 
Edition, 1968 printing, Vol. 12, pp. 246B-249. (Written circa 1955.) 
Included in Part A. 

[101] "Cybernetics," Encyclopedia Britannica, Chicago, IL, 14th Edition, 
1968 printing, Vol. 12. (Written circa 1955.) Not included. 

[102] "The Rate of Approach to Ideal Coding (Abstract)," Proceedings 
Institute of Radio Engineers, Vol. 43 (1955), p. 356. Included in Part A. 

[103] "The Bandwagon (Editorial)," Institute of Radio Engineers, 
Transactions on Information Theory, Vol. IT-2 (March, 1956), p. 3. 
Included in Part A. 

[104] "Information Theory," Seminar Notes, Massachusetts Institute of 
Technology, 1956 and succeeding years. Included in this volume. 
Contains the following sections: 

"A skeleton key to the information theory notes," 3 pp. "Bounds on the 


tails of martingales and related questions," 19 pp. "Some useful 
inequalities for distribution functions," 3 pp. "A lower bound on the 
tail of a distribution," 9 pp. "A combinatorial theorem," 1 p. "Some 
results on determinants," 3 pp. "Upper and lower bounds for powers of 
a matrix with non-negative elements," 3 pp. "The number of sequences 
of a given length," 3 pp. "Characteristic for a language with 
independent letters," 4 pp. "The probability of error in optimal codes," 
5 pp. "Zero error codes and the zero error capacity C ," 10 pp. 
"Lower bound for P ef for a completely connected channel with 
feedback," 1 p. "A lower bound for P e when R > C," 2 pp. "A lower 
bound for P e ," 2 pp. "Lower bound with one type of input and many 
types of output," 3 pp. "Application of 'sphere-packing' bounds to 
feedback case," 8 pp. "A result for the memoryless feedback channel," 
1 p. "Continuity of P e opt as a function of transition probabilities," 1 p. 
"Codes of a fixed composition," 1 p. "Relation of P e to p," 2 pp. 
"Bound on P e for random ode by simple threshold argument," 4 pp. 
"A bound on P e for a random code," 3 pp. "The Feinstein bound," 2 
pp. "Relations between probability and minimum word separation," 4 
pp. "Inequalities for decodable codes," 3 pp. "Convexity of channel 
capacity as a function of transition probabilities," 1 pp. "A geometric 
interpretation of channel capacity," 6 pp. "Log moment generating 
function for the square of a Gaussian variate," 2 pp. "Upper bound on 
P e for Gaussian channel by expurgated random code," 2 pp. "Lower 
bound on P e in Gaussian channel by minimum distance argument," 2 
pp. "The sphere packing bound for the Gaussian power limited 
channel," 4 pp. "The ^-terminal channel," 7 pp. "Conditions for 
constant mutual information," 2 pp. "The central limit theorem with 
large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and 
lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior 
of the distribution function," 5 pp. "Generalized Chebycheff and 
Chernoff inequalities," 1 p. "Channels with side information at the 
transmitter," 13 pp. "Some miscellaneous results in coding theory," 15 
pp. "Error probability bounds for noisy channels," 20 pp. 

[105] "Reliable Machines from Unreliable Components," notes of five 
lectures, Massachusetts Institute of Technology, Spring 1956, 24 pp. Not 

[106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes 
taken by W. W. Peterson, Massachusetts Institute of Technology, Spring, 
1956, 8 pp. Included in this volume. 

[107] "Notes on Relation of Error Probability to Delay in a Noisy Channel," 
notes of a lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 
3 pp. Included in this volume. 

"Notes on the Kelly Betting Theory of Noisy Information," notes of a 
lecture, Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp. 

- 10- 

Included in this volume. 

[109] "The Zero Error Capacity of a Noisy Channel," Institute of Radio 
Engineers, Transactions on Information Theory, Vol. IT-2 (September, 
1956), pp. S8-S19. Reprinted in D. Slepian, editor, Key Papers in the 
Development of Information Theory, IEEE Press, NY, 1974. Included in 
Part A. 

[110] (With Peter Elias and Amiel Feinstein) "A Note on the Maximum Flow 
Through a Network," Institute of Radio Engineers, Transactions on 
Information Theory, Vol. IT-2 (December, 1956), pp. 117-119. 
(Received July 11, 1956.) Included in Part B. 

[Ill] "Certain Results in Coding Theory for Noisy Channels," Information 
and Control, Vol. 1 (1957), pp. 6-25. (Received April 22, 1957.) 
Reprinted in D. Slepian, editor, Key Papers in the Development of 
Information Theory, IEEE Press, NY, 1974. Included in Part A. 

[112] "Geometrische Deutung einiger Ergebnisse bei die Berechnung der 
Kanal Capazitat" [Geometrical meaning of some results in the 
calculation of channel capacity], Nachrichtentechnische Zeit. (N.T.Z.), 
Vol. 10 (No. 1, January 1957), pp. 1-4. Not included, since the English 
version is included. 

[113] "Some Geometrical Results in Channel Capacity," Verband Deutsche 
Elektrotechniker Fachber., Vol. 19 (II) (1956), pp. 13-15 = 
Nachrichtentechnische Fachber. (N.T.F.), Vol. 6 (1957). English version 
of the preceding work. Included in Part A. 

[1 14] "Von Neumann's Contribution to Automata Theory," Bulletin American 
Mathematical Society, Vol. 64 (No. 3, Part 2, 1958), pp. 123-129. 
(Received Feb. 10, 1958.) Included in Part B. 

[115] "A Note on a Partial Ordering for Communication Channels," 
Information and Control, Vol. 1 (1958), pp. 390-397. (Received March 
24, 1958.) Reprinted in D. Slepian, editor, Key Papers in the 
Development of Information Theory, IEEE Press, NY, 1974. Included in 
Part A. 

[116] "Channels With Side Information at the Transmitter," IBM Journal 
Research and Development, Vol. 2 (1958), pp. 289-293. (Received Sept. 
15, 1958.) Reprinted in D. Slepian, editor, Key Papers in the 
Development of Information Theory, IEEE Press, NY, 1974. Included in 
Part A. 

[117] "Probability of Error for Optimal Codes in a Gaussian Channel," Bell 
System Technical Journal, Vol. 38 (1959), pp. 611-656. (Received Oct. 
17, 1958.) Included in Part A. 

[118] "Coding Theorems for a Discrete Source With a Fidelity Criterion," 
Institute of Radio Engineers, International Convention Record, Vol. 7 

-11 - 

(Part 4, 1959), pp. 142-163. Reprinted with changes in Information and 
Decision Processes, edited by R. E. Machol, McGraw-Hill, NY, 1960, 
pp. 93-126. Reprinted in D. Slepian, editor, Key Papers in the 
Development of Information Theory, IEEE Press, NY, 1974. Included in 
Part A. 

[119] "Two-Way Communication Channels," in Proceedings Fourth Berkeley 
Symposium Probability and Statistics, June 20 - July 30, 1960 , edited by 
J. Neyman, Univ. Calif. Press, Berkeley, CA, Vol. 1, 1961, pp. 611-644. 
Reprinted in D. Slepian, editor, Key Papers in the Development of 
Information Theory, IEEE Press, NY, 1974. Included in Part A. 

[120] "Computers and Automation — Progress and Promise in the Twentieth 
Century," Man, Science, Learning and Education. The Semicentennial 
Lectures at Rice University , edited by S. W. Higginbotham, Supplement 
2 to Vol. XLIX, Rice University Studies, Rice Univ., 1963, pp. 201-211. 
Included in Part B. 

[121] Papers in Information Theory and Cybernetics (in Russian), Izd. Inostr. 
Lit., Moscow, 1963, 824 pp. Edited by R. L. Dobrushin and O. B. 
Lupanova, preface by A. N. Kolmogorov. Contains Russian translations 
of [1], [6], [14], [25], [37], [40], [43], [44], [50], [51], [54]-[56], [65], 
[68]-[70], [80], [82], [89], [90], [93], [94], [99], [103], [109]-[111], 

[122] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error 
Probability for Coding on Discrete Memoryless Channels I," 
Information and Control, Vol. 10 (1967), pp. 65-103. (Received Jan. 18, 
1966.) Reprinted in D. Slepian, editor, Key Papers in the Development 
of Information Theory, IEEE Press, NY, 1974. Included in Part A. 

[123] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error 
Probability for Coding on Discrete Memoryless Channels U," 
Information and Control, Vol. 10 (1967), pp. 522-552. (Received Jan. 
18, 1966.) Reprinted in D. Slepian, editor, Key Papers in the 
Development of Information Theory, IEEE Press, NY, 1974. Included in 
Part A. 

[124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the 
American Driver in England," typescript, All Souls College, Oxford, 
Trinity term, 1978, 7 pp. + 8 figs. Included in this volume. 

[125] "Claude Shannon's No-Drop Juggling Diorama," Juggler's World, Vol. 
34 (March, 1982), pp. 20-22. Included in Part B. 

[126] "Scientific Aspects of Juggling," Typescript, circa 1980. Included in 

[127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. Included in 
this volume. 

K-t7«IA (-*»*) 

is J 

Cover Sheet for Technical Memoranda 
Research Department 

subject: The Use of the Lakato s-Hi okman Relay in a 
Subscriber Sender - Case 20878 


i - Patent .Deit. (letter 9/27/40) 


1 — e— W.W.Ke^all, Case Pile 

3 - T.C.Fry 

4 - A* B. Clark 

s - B.D.Holbrook 

6 - G.R.Stibitz 

7 - G.V.King 

8 -Miss Hanle 

mm- 40-130-179 
date August 13, 1940 
author c.E.Shannon 



A study is made of the possibilities of using 
the Lakato s- Hickman type relay for the counting, regis- 
tering, steering, and pulse apportioning operations in 
a subscriber sender. Cirouits are shown for the more 
important parts of the circuit where it appears that the 
new type relay would effeot an eoonomy. 


Tilt Use of the Lakatos-Hiokman Relay in a Sub bo r iter Sander • 
Cast E0878 


August 15, 1940 


The Lakatos-Siokmen type relay 1 * using the relay springs 
as part of the magnetic eiroult can he used as a very eeonomioal 
type of pulse counter and registration device. In faot , one suoh 
relay with twenty moving springs can count and register up to ten 
pulses, while the same operation requires at least five ordinary 
relays, and some standard oirouits use as many as twenty to re- 
duce the spring loading on the relays and the contact loading in 
the pulsing circuit. It has been suggested that this new type 
of relay might he used for some or all of the many counting, 
steering, and registration oirouits in a subscriber type sender* 
The present memorandum gives some oirouits for accomplishing 
this* The chief problem in the design of these oirouits Is 
that of performing the various translating operations necessary 
in converting the incoming pulses into group and brush selections, 
or P.C.I, pulses as the oase may be, without using more oontaot 
elements than are available on the counting relay. Two different 
solutions are given here. The first was made as economical as 
possible but at the oost of one disadvantage. Under certain 
conditions of oontaet failure in the thousands or hundreds regis- 
ter the sender will oonneot the subscriber to an incorrect number 
rather than connect ing to a tell-tale and giving him a busy sig- 
nal. The seoond oiroult, which we will call the positive aotion 
oiroult^, is designed to overcome this difficulty but does so at 
the expense of more contaots and wiring. Some compromise between 
these circuits may be the most desirable. The oirouits by no 
means represent a complete sender. It appears that the problems 
connected with the offioe code (i.e. the first two or three 
digits) can be handled without muoh difficulty. At any rate 
these oirouits will depend on the type of decoder used, and 
would represent a second stage in the design* We have therefore 
designed what might be called a "four digit sender** considering 
only the problems arising in the thousands, hundreds, tens and 
units digits. We also have omitted consideration of the parts 
of the oiroult used for control and supervisory purposes, since 
these can be easily handled by existing oirouits, and do not 
directly involve the new type relay. Our chief purpose is to 

Isee "Oiroult Analysis for Laxatos-Eiokman Type Relay", 
0. R. Stibits, MM40-150-1BO, Jan. 15, 1940, Oase £0878. 

^This circuit was suggested by Hr. 0. T. King 

■how that the new type counter oontalna sufficient contact 
element! for aost of the steering and counting circuit* of the 
subscriber sender. It is always possible to add more contacts 
at an/ stage in the new type counter by the arrangement of 
springs in Jig. 1, but this would be undesirable from the 
standpoint of standardization* At any rate it was found that 
even in the positive action circuit, only two stages in one 
register needed more contacts than are already available, and 
two additional ordinary relays were introduced here to carry the 
contact load* 

It should be pointed out that an extremely simple and 
economical sender (i.e., much simpler than those given here) 
could be designed using the new type counter were it not for 
the peculiar translation codes involved. Thus if we could start 
*Yrom scratch" and design translation codes particularly adapted 
to the characteristics of the new relay, the circuits could be 
made very simple indeed. Even using the existing oodes which 
were constructed to simplify the present type olrouits, the use 
of the new counter allows a remarkable simplicity and economy* 

The circuits were designed by a combination of common 
sense and Boolean algebra methods. We will omit the details 
involved in their design. Although it is possible that a few 
superfluous elements remain, it is doubtful if they can be 
simplified very much* 

Figure E is a block diagram of the proposed sender* 
In the present panel and crossbar senders, pulse counting is 
done in the same circuit for each digit and the numbers trans- 
ferred from this counting circuit to a set of registering cir- 
cuits, one for eaoh digit, through an incoming steering chain. 
The registering circuits in the panel type sender consist of a 
set of five ordinary relays per digit, while in the crossbar 
system the A digit is registered on one or two verticals of a 
crossbar switch* In Figure S, on the other hand, eaoh digit 
has one of the new type counter relays which acts both as a 
pulse counter and as a register. The incoming steering chain 
steers the incoming pulses to the correct counter-register 
rather than steering the number recorded by the input pulse 
counter to a digit register* The input steering chain may or 
may not be one of the new type counters* The steering opera- 
tion can be done with the new type counter, but it appears to 
require special devices, as for example polarised springs, in 
order to energize both windings of the register relays after 
receiving a digit* Even using the present type of steering 
chain a great simplification is possible, for only one wire, 
the pulsing lead, needs to be steered to the various digit 
registers, rather than the five leads of the present type 
sender* Another possibility is using a new type counter to 
count the groups of pulses and operate a set of relays 8^, Sj, 

Sq, Sthi Sst Sf » s U come 1a after the A, B, 0, IB, I, T, 

and U digits are received end energize both eoile of the corre- 
sponding registers* 

After the digits are registered on the new type 
counters, these numbers are translated bj means of the oontaet 
interconnections into the code corresponding to the incoming 
brush, incoming group, final brush, tens, and units selections, 
which are represented by a ground on one of the leads in the 
groups marked IB, 10, YB, T, and V, respectively. These groups 
of leads are connected in sequence to the revertive pulse counter 
by means of the revert ire group counter* The revertive pulse 
counter will be one of the new type relays and is connected in 
suoh a way as to open the fundamental circuit and thus stop the 
revertive pulsing when it reaches the first ground. The revertive 
group counter or revertive steering chain, of course, steps ahead 
after each group of revertive pulses through the action of a slow 
release relay. This last steering operation cannot be done solely 
with one of the new type relays for it is necessary to steer ten 
leads in the tens and units digits. It could be done, however, 
with a new type counter in conjunction with four ordinary relays. 

In the case of a call to a manual office the outputs 
of the digit registers are translated by a P.O.I, circuit into 
the correct P.O.I, codes. This circuit, too, can make use of the 
new type counter in the quadrant ing operation, i.e. in apportion- 
ing four quadrants to each of the four digits to be transmitted. 
This would be done with a sixteen stage counter (or if it is de- 
sirable to have all oounters with ten stages, two of these could 
be connected "in series") replacing the present sequence switch* 

Of course there must be an interlock between the incom- 
ing and revertive steering chains to prevent any selection being 
made before sufficient information has been received. This can 
be done by fairly standard methods* 

A rough comparison can be made between the relay re- 
quirements of the present panel type sender end the design pro* 
posed here. Omitting parts of the circuit which would be sub- 
stantially the same the requirements are listed below: 


Panel Sender Proposed Sender 

Ordinary Hew Type Ordinary 

Operation Relays Counters Belays 

Input Counting 1* - 

Input Steering It i • 

Registration »• f 

Revertive Counting . *Q t « 

Revertive Steering 10 L- JL 

Total U T 

In addition, a eequenoe ewitoh la replaoed by a new type counter. 
Tliasa figures are based on the positive action oirouit. Jhe 
other oirouit uses 6 ordinary relays. This eoaparison of the 
numbers of relays involved shows only a small part of the saving, 
however. The wiring and fundamental method of operation of the 
new oirouit is muoh simpler which tends both toward eoonomy and, 
providing the new relay ©an be made suffielently reliable, elim- 
ination of faults and errors* 

It is a little more difficult to give a quantitative 
comparison of tha proposed sender with the present crossbar type 
sender due to the differences in the types of oirouit elements In- 
volved, but it appears that the saving would be of the same order 
of magnitude* 

The new type counter with ten stages aota like a series 
of twenty relays which come in sequentially as the two coils of 
the relay are alternately energized. Thus after n pulses the 
first Sn relays are operated. If, after a series of pulses only 
one of the two coils on a counter remains energized we can only 
be sure of the oontacts on that side. It was found that under 
these conditions the number of eontaots available was far too 
small in all of the four registers for the various translating 
operations neoessary. We have therefore assumed the steering 
circuit should be designed in such a way as to energize both 
coils of a counter after it has received its series of pulses** 
This insures the oontacts on both sides and each stage then has 
the equivalent of two transfer eontaots and two additional eon- 
taots somewhat similar to a switohhook connection. Thus eaoh 
stage may be considered as a relay with the eontaots available 
indicated In figure 5. Our circuit diagrams are drawn from 
this point of view* 

Tor the convenience of the reader we will list the 
various translation oodes used in the sender* The incoming 
brush seleotlon depends only on the thousands digit and Is 
given by the following tablet 

Incoming Brush 



0, 1 
*, * 
4. 5 

•See the memorandum "Oirouit Arrangement for Counting Relay with 
Mechanically Independent Contact Springs", by B* D. Bolbrook, 
HM-40-130-149, July 5, 1940, Oase ££108-1. 

The incoming group ssleotion depends on both the 
hundreds and thousands digits and is given bj tha following; 





< 6 

< 5 

Inooeiing Group 



Tha final brush salaotion dapands only on tha hundreds 
We hare tha following oodat 


0, 6 

1. • 

*, 1 

3, 8 

4, • 

Final Brush 




P.O.I. Oode for Thousands Digit 

It should be remembered that an inooming brush, incom- 
ing group, or final brush saleotion of & corresponds to n ♦ 1 
rerertire pulses. Tha same remark: applies to tha tans and hun- 
dreds selection. 

Digits are sent to a call indicator bjr series of posi- 
tive and negative pulses, four for aaoh digit* Two different 
codes are used for this, one for the thousands digit and tha 
other for thehuadreda, tans, and units. The thousands oode is 
an additive one baaed on the numbers 1, 2, 4, and 8 as follows: 









Corresponding Additive 






- 6 • 

The sum of the numbers ocr responding to tht columns in whioh a 
digit has tha symbol - gives that digit, henot tha additive 
property of tha code. In this tabla I, II. IH, and IT refer 
to tha four pulses or quadrants. In the first and third quadrants 
represents a ground and a - represents a posit ire pulse. In the 
even quadrants means a light negative pulse and the -, a hear? 
negative pulse. We have chosen this representation of the oode 
for comparison with the P.O.I, circuit in which four leads are 
grounded or not in aooordanoe with the above table* Thus if the 
digit 8 is registered in the thousands place, lends II and HI in 
a group I, II, III, IT are grounded. The presence or absence of 
these grounds are translated into positive or negative pulses by 
two relays TS and RS. 

The hundreds, tens, and units P.O.I, code is also addi- 
tive based on the numbers 1, S, 4, 6. Using the same conventions 
it is represented by the following table: 

P.O.I. Oode for Hundreds, Tens, and Units Digits 

H, T, or Quadrant 

u Digit i n in it 

i .000 

t o-oo 

8 ..00 

4 - 

5 - 

6 -00 

T — — 

8 - - 

9 0- 


Numbers (1) (8) (4) (5) 

The circuit for the tens or units register is shown In Figure 4. 
The operation is quite obvious. In the ease of a full mechanical 
call, if 6 for example were dialed in the tans plaee, the first 
six relays are looked in, which places a ground on the lead marked 
6. These are connected through the revert ive steering chain to 
the revertive counter which reaches this ground after the seventh 
revert ive pulse. The presence of this ground operates a relay 
whioh opens the fundamental circuit and stops the pulsing. 
A ground is also put on leads II and HI for a P.O.I, call. 
The operation of the P.O.I, circuit will be described later. 
The thousands and hundreds register is shown in figure 5 for the 
positive action circuit and in Figure 6 for the more economical 
circuit. In Figure 8, many of the contaots do double duty, 
translating both for P.O.I, and full mechanical calls. This is 
done through a relay P which is operated for a manual call and 
not for amechanical call. In the hundreds register there were 
not enough contacts available in the fifth and tenth stages. 

The relays R and 8 ere used to •arrjr part of the eontaot load* 
This oireuit la designed ae that ohe and only one of the IB, 10, 
and TB laada la grounded for a given number. In ease of a oon- 
taot failure none would he grounded and the corresponding commu- 
tator would supposedly go to a telltale. In the oirouit of figure 
6, on the. other hand, more than one of the IB, 10, or TB leads may 
he grounded at the same time. Thus if the thousands digit is 8, 
both 8 and 4 in the IB group are grounded. If the back eontaet 
on 8 failed the rerertive pulse counter would not stop the pulsing 
aotion at brush 8 as it should but would go on to the fourth brush. 
Howersr, this olreuit is considerably simpler than Figure 8, and 
does not appear worse from the standpoint of possible wrong num- 
bers than the present type of sender* 

The P.C.I, eirouit is shown in Figure 7. I is a relay 
whioh is operated in the odd quadrants and not in the even quad- 
rants. TS and RS are relays whose windings are oonneoted sequen- 
tially through the P.O.I, impulse ehain to first the thousands 
P.O.I, leads I, II, IH, and IT, then the hundreds, etc. aoeord- 
ing to the following tablet 













Th I 

Th II 



Th II 




Th IT 


E I 

Th IT 



E I 





e n 

; i 




i 8 


T I 


; • 


T I 

t n 



T in 

t n 






U I 




V I 

u n 

u in 

u n 



v m 




In the odd quadrants Z is operated, placing a ground on the 
fundamental ring (»)• The fundamental tip (FT) ia connected 
through Z to either ground or positive battery according as 
TS is operated or not. This depends of course on the condl- 

- 8 - 

t ion of the P.C.I, lead to whioh TS is connected at the time* 
Similarly in the eran quadrants light or beary roltage is 
applied to FR according to the eondition of RS while FT is 

Figure 8 shows the rerertire steering chain and re- 
rertire pulse counter. 


FIG. 3 

— I 
— u 
V~ m 


I 7 


L 9 J 

FIG. 4- 




■ Vj 






Mil TtimM! UMIITMIIS. IK.. Ill 



















M M S H 0-C\J<T>«- 




Ah*, ^^h. 




3 C" 

<Hi- *<Hl< 



o <\j «i 


O - WO 1 

I 1 ■ 








PHI IN U.t.A. 

l ill-A l«-3») 

F/0. 7 





■ Vj 





lilt TELIPMIE liMIITMIH. IK.. lit 




IB < 


I - 



T I 

7 • 


-o o- 

-o t>- 

3 3 

o o 


-o o- 

o o- 

■o o- 

c o 

o o 

o o- 


o o 
o o 

K3 o 

o O 


-O O- 

-o o- 


■ 7 
















■ Vj 






Kit UMUTHICI. IK., It* Tti 





by TKfS is a Final 


Claude E. Shannon ^.w /L-lL - i f) 4 


1. The deflection mechanism may be divided into three partB. 
The first is driven by two shafts and has one shaft as out- 
put, which feeds the second part. This unit has a single 
shaft output which serves as input to the third part, whose 
output is also a single shaft, used as the desired azimuth cor- 

2. The first unit is a simple integrator. It*, output rate is 

3. The second part is the same circuit as previous rate finders. 
Its presence appears to be detrimental to the operation of 
the system from several standpoints. The output e of this part 
satisfies i 

• ■ x-f- y 


4. The third and most important part of the macnine satisfies 

q + R 4 + L q - • 

in whicht 

• ■ an input forcing function which except for transients in 
the seoond part and other small effeots ia the function 
whose rate is to bo found. 

q ■ the rate of e as found by the device. The output of the 
mechanism is sin"^" Q. 

R, L, S are. positive constants depending on the gear ratios, 

etc. in the machine. 
The mechanism therefore acts like an R, L, C circuit in which 
the differential inductance is a function of the current, 

v 1 - q 2 

The system can be critically damped for differential displace- 
ments near at most two values of the current. 
Omitting the effect of backlash, the system is stable for any 
initial conditions whatever, with a linear forcing function, 
e s At + fl. It will approach asymptotically and possibly with 
osoillation a position where q is proportional to e. An error 
function can be found which decreases at a rate -R (q - q Q ) 2 
4o being the asymptotic value of q. 

If the system is less than critically damped ordinary gear 
play type of backlash can and will cause oscillation. This 
includes play in gears, aaaers, lead screws, rack and pinions 
and looseness of balls in the integrator carriages. The oscilla- 
tion is not unstable in the sense of being erratic, or growing 

- 3 - 

without limit, but is of a perfectly definite frequency and 
amplitude. This type of backlash acts exactly like a peculiar 
shaped periodic forcing function. Approximate formulas for 
the frequenoy and amplitude of the oscillation are 




/s 2 I UoLd -A) 2 

<* c 

^ and B 2 being the amounts of backlash in the two driven shafts 
as measured in a certain manner. 

8. elastic deformations of shafts and plates can be divided into 
two parts. .One is exactly equivalent to the gear type of 
backlash and may be grouped with B]_ and B 2 above. The other 
has the effect of altering the parameters R, L, S of the cir- 
cuit and also adding higher order derivatives with small co- 
efficients. This will slightly alter the time constant and 
the natural frequency of the system. 

9. The manner in which the arcsin function is obtained seems to 
me distinctly disadvantageous to the operation of the system 
for a nnmber of reasons, chiufly since to eliminate backlash 

oscillation it requires high overdamping near q ■ and this 
slows down the response for low target speeds. 
10. The general problem of rate finding and snoo-hing is con- 
sidered briefly from two angles - as a problem in approxi- 
mating a certain given transfer admittance ana as a problem 
in finding the form of a differential equation. The first 
method based on a linear differential equation leads to ten- 
tative designs whicn I think would be an improvement over the 
present one. The second method indicates the -ossibility of 
still more improvement if non-linear equations can be satis- 
factorily analyzed. 


general Considerations . The deflection mechanism is a aevice de- 
signed to find 5i mechanically from the formula 

• in*! = S a ^ tp 

having cne shaft whose rate of turning is£ a and another whose 
angular position is Jj> t ?f giving c-t as the position of a shaft. 
The system is also supposed to smooth out small errors in^a* 
The mechanism, as actually constructed, is shown in 
Figure 1. By a rearrangement of adders, it may be drawn as shown 
in Figure 2. incidently, the device of rearranging and combining 

adder units is frequently useful in studying these systens. In 
this case it both clarifies the physical operation and simplifies 
the mathematical analysis. The box IV on the right of Fig. 1 
represents two adders wigh, essentially, a common shaf t. The 
output is equal to the sum of the inputs with the indicated signs 
prefixed. A variable associated with a shaft represents the angu- 
lar position of that shaft unless specifically stated otherwise. 
Gears art omitted f rom t he diagram but included as coefficients 
in the equations. It may also be worthwhile to point out that the 
best method of setting down the equation of such a system is 
usually the following: 

1. Considering oniy the integrators and function Lie-vices, 
label the various snafts UBing the minimum number of variaoles, 
Yiorkin^ backward from driver to driving snafts. Thus if the out- 
put of an integrator is labeled z, its displacement is i (assuming 
constant disk rate). If the output of an x to In x gear is sin u, 
its input is e sin u . Marking backwards rives the differential 
instead of the integral form of the equation. 

2. Hew concentrate on the adders, grouping together cs 
many as possible, and write the equations of constrain*. These 
will be the equations of the system. 

I find the use of electrical analogues very useful in 
under standing tnese devices and have sed throughout a notation 
which emchasizes this idea. 

As the maohine is drawn in Fig. 2, it consists of threa 
independently operating units. The output of the first i3 a 
single shaft serving as input to the second, the output of the 
second a single shaft feeding the third, and the output of this 
being a shaft used as S 3, 

The operation is ruughly as follows: Integrator I 
multiplies its disk rate oy its displacement, so that the rate 
of turning of its output is y = ^0 t p £ a » The actual position of 
this y shaft can carry no significance. It is 

y ■ 

p. tp2 a dt +• y 

a variable which cepencs on the entire previous history of tne 
sighting telescopes to say nothing of possiole integrator slippage. 
At two different tisas, vrith a target at the same position and 
speed, this shaft would have entirely different angular nositions 
but the same rate of turning. 

The output of integrator I feeds into the middle uart 
cf the system which is exactly the rate finder, of saost older 
directors. This part of the divice seems to me net only super- 
fluous but actually detrimental to the operation. It is equiva- 
lent to an R, L, circuit (Fig. 3) with impressed voltage y and 
cutout x, che voltage across the inductance 

3. A small response h(t) for the function g(t). 

High frequencies in g(t) appear practically un- 
diminished and in the same pnase in h(t) since the 
impedance is high compared to R. 


- % t 

In ^ 

1a t £e + h(t) 

In adder III, x is added to y in equal proportions to give e. 

e _ y + ±1 A +• K e Ll + h(t) 

As vre pointed out above, y already contains an irrelevant additive 
constant, so the addition of another, gj" A which happens to be pro- 
portional to the target rate is of no possible significance. The 
term K e ' certainly is only detrimental being an unwanted 
transient. For a time I thought that the reason for the middle 
part of the machine was the final term h(t). For hi^h frequen- 
cies this is approximately g(t), and might be used to buck out 
these high frequency following errors, much as was done in some 
early radio circuits to recuce a-c hum. However, a study of the 
design diagrams shows that the two error functions are actually 
in phase as I have indicated in the equation, so that these high 
frequency errors are added , making the situation worse. £ven if 
the phase of x were reversed on entering adder III, I think it 

doubtful whether the presence of this part of the system -would be 
justifiable. It would be necessary to show that tne frequencies • 
were high so that the two actually did cancel, and also 
that the disadvantages of the transient term did not overcome the 
advantages obtained. Note that the middle part can function in 
no way as a rate finder. The ri^ht hand part of the machine does 
its own rate finding as we will see, and the rate found by the 
middle part could not possibly be used because of the undetermined 
constant in y. 

•e prooeed now to the third part of the machine which 
is the major concern of the study. Concentrating on the adder IV, 
the equation of the system is obviously 

L -| sin" 1 q=e-3q-Rq 


5 qt iiL L q = e 

This is the equation of a series R, L, C, circuit with the in- 
ductance a function of the current passing through it. Induc- 
tance may be defined by the Lagrangian equations or by 

- 10 - 

and it is clear from the above equation that 

A i ■ l sin" 1 i 

or A . L Bia 1 

This function varies as shoim in figure 4. For our work, however 
a more useful parameter is what is sometimes called the differential 
inductanoe which nay be defined by 

so that in our case 

This inductance is useful when we have an equilibrium current qg 
and are considering the effect of small variations about this equi- 
librium. Omitting second order terms the system will be equivalent 
to one with constant R, L, G parameters, the inductance being 
taken as L^. The variation of L-q with current is snown in figure 5. 

The action is the opposite of that of a "swinging" choke where, be- 
cause of saturation, the differential inductance decreases with 
large currents. 

The mechanical idea behind the operation of this system 
is quite simple. Suppose shaft e to be turning at a constant rate. 
The system will be in equilibrium if the displacement of integrator V 
is such as to make its output feeding into the adder equal and op- 
posite to e, and the displacement of integrator VI at zero. Under 
these conditions, shaft q measures the rate of e and shaft V, the 
output of the device, the arcsin of this rate, if the rates are 
not correct, the adder changes the second derivative shaft in 
such a direction as to equalize the rates. The q shaft serves as 
a danper to prevent continual oscillation aoout the equilibrium 

- 12 - 

MATHEMATICAL THEORY (Backlash not Present) 

Differential Operation 

If e is turning at a constant rate and the system is at 
equilibrium, and then a small differential disturbance is applied 
to the system, it will clearly respond very nearly like an R, L, 
C, circuit with constant parameters, the inductance used being the 
differential inductance for the equilibrium current 


y'i - 41 

Such a system has a tine constant of 

2 L eff 


T x 


tyl - q| 

It is critically damped if 

H 2 - 4 L eff S ■ 

4L S 

which, of course, only occurs at 

16 i/ 

For values of q greater in absolute value than this, the system is 
oscillatory, for values less, over damped. 

- 13 - 

Proof of General Stability -with Linear e 

In proving the stability of this system, I have used a 
method -which may be new in some respects. It was suggested by the 
fact that in a non-dissipative mecnanioal system, the potential 
energy U is a minimum at a point where the system is differentially 
stable, and the method is, in a sense, a generalization of that 
criterion. It is not, however, limited to differential stability, 
or to non-dissipacive systems. Since the method may be of use in 
other investigations of this type, I will first describe it in 
general terms. 

Suppose we have a differential equation system in which 
n variables and derivatives may be specified independently in the 
initial conditions. 7<e will say that the system is stable for all 
initial conditions and all driving functions if any two solutions 
of the system with the same driving funoiions approach each other 
in the sense that 

Lim 2 \x ± - y ± \ - o 
t ->co i - r 

where xj^t), x 2 ( t) . . .x^t) is one solution and y x (t) ...y n (t) the 
other. If this limit is zero for certain types of driving functions, 
we will say the system is stable for these functions. 
Thereomi If a continuous function Q(x 1 ...z n , y 1 ...y n ,t) can be 
found having the following properties ' 

X. Q>0 for all x ± , y t , t, the equality holding if and 
only if x ± a y ± . 

- 14 - 

2, dQ a t all times, when the x^ and y^ are solutions 
of the system, with the same driving function. 

3. It is impossible for Q to remain indef initelj>A ^ 0. 
Then the system is completely stable. 

For the function Q is non- increasing but always^ and 
must therefore approach a limit A>0 as t ~>oo , but by 5. A^O 
is impossible, hence A = 0, and each Ix^-y^/ — 5>0. 

Conversely, it oan be shown that if only a single forc- 
ing function is involved, and the system is stable for this funo- 
tion, a Q exists of the type described. 

Roughly, the method is to find a "distance" or "error" 
function Q between two solutions which is zero only when the so- 
lutions are identical and which always decreases. 

As an example of this method it is easy to prove the 
complete stability of the ordinary R, L, C, circuit with constant . 
parameters without solving the equation. The differential equation 

" Sq + R$ + L q = e 
and we choose q and \ as coordinates. Let two solutions be q 1# 
q^and q 2 , q 2 «nd consider the funoticn Q = y ( qi -q 2 ) 2 + £ (qx-qg) . 
Condition 1 is obviously satisfied. How 

||- SCqi-qgXqi-qg) + L(q^-q' 2 ) (aj-qg) 

- -r (ii-4 2 ) 2 £o 

- 16 - 

. S (n - At - 3 . EA)2 

obviously the minimum of Q with respect to q occurs at 

At B - SA 

q - s + s 

Also • a 

q - s 

ciQ = L 

y 1 - q 

which vanishes only for q'f It is readily verified that this 
is a minimum, and that (J is zero at this point for any t. Now 

dt oq » 

i - s 

5S ( q -4-| + §)0..4)>L 

S S 3 - ~ 



Vl-q 8 
q s ^ 

- (At t- 3 - 3 q - R q) 

if q rjid q satisfy 

Sq f Bq + L > At +- B. 

V 1 - q 2 

- 17 - 


d| « (Sq - At - B f J£) (q - ±) 
~ (4 " -f)Ut + 3 - Sq - Rq) 

■ -E (q - |) 2 * 

Note that this rate is identical with that found in the linear case. 
Incidentally, it was by working baokward from this rate that a 
suitable function Q was first found. 

For Q to approaoh a limit K>0, it is necessary for q 
to approach zero, and q therefore, to approaoh a linear function 
of t differing by a constant from its equilibrium value. But from 
the original differential equation q must approach a oonstant different 
from zero, which contradicts 4^0. This does not however, quite com- 
plete the stability proof due to a certain meohanical peculiarity of the 
system. Let us plot the equilevel lines of Q against axes X * (q - At 
- | and Y « q. (Figure 6). 

The x io sin x gear in tne ac-cuai mecnanisn has a limited 
movement, and is prevented f rem going too far by e slip clutch and 
stop. If ' q Z 1, the stop prevents ;qj from increasing anymore. 
The original equation is replaced by 


until the pressure on the stop reverses, oo far we have snowi that 
under the original equation Q always aecreases. In terms of our 
plot this means that if we start a solution inside the curve marked C, 
the solution will certainly converge to the equilibrium position, for 
the solution can never "escape" from C and hit one of the two lines 
1 = r K, where the differential equation changes. ^7hen we are not on 

- 19 

one of these lines a solution will, in fact, spiral inward in the 
clockwise sense, as maybe seen by writing the differential equation 
in the form 

( n - i* B 3A, R As _ L a 

Consider the s igns of 5 and (q-A/s) in the four quadrants about the 
equilibrium position. In I for example (q-A/S) > and the X coordl- 
nate of a solution must increase with tj q < so q must decrease, 
giving a clockwise sense to the notion. Similarly the other quadrants 
may be verified. Some of the solutions starting out3ide of C will hit one of 
the lines, but the solution will still be stable. It is easy to show, 
by a study of the signs of the variables and their rates that a solu- 
tion can only hit the upper line to the left of the point with 

coordinates I = 1 (| - £) and Y . K, and that if one does, it will 
nove along the lins to the right until it reaches P-^ and then return 
to the original equation. similar situation holds for the lower 
line. If we should start a solution on the upper line to the right 
of Pj it would leave the line immediately. The solution is always 
horizontal (i.e. q ■ <)) on tne line through P^, the equilibrium 
point and Pg. 

If R ■ the function Q is constant since £S ■ o &nd 


therefore the solutions of the equation 
Sq L q ■ At + B 

- 20 - 

are" the equilevel curves in Figure 6. 

I have attempted in several different -ways to generalize 
this proof for arbitrary input functions e(t), but so far have 
no completely rigorous proof, dowever, some of the arguments 
come so near as to m a k e me almost certain of oomplete stability. 
It can be shown, for example, that two different solutions with 
the same e(t> cannot definitely divergei i.e. |qj > -q 2 | f | |i-4g \ 
cannot become and remain greater than some positive constant 
(assuming e and e' bounded). Also if two solutions get close 
together (with respect to both q and q), they will certainly con- 

The Effect of Backlash 
— — — — _____ 

In order to understand how backlash can cause oscillation, 
let us first consider a much simplified case. Suppose we have a 
second order linear system which is less than critically danmed with 
no backlash (Figure 7). 

Sq -f- R 4 + Lq-e 
If, at t " we suddenly impress e - E (constant) on the system 
(q - \ = 0), the response is a damped oscillation (Figure 8). 

- 21 - 

Now in the mechanical system there are only two rf i 

oniy two driven shales 

811(1 B » and backlash only aff B( .+. C • 

or thes p dirCCtly) thS °P e ^ion 

of these. , robably tne gr 

^ 18 W the adder av«+o„ 

driving shaft A. Let us assume for 

assume for a moment that this is the 
only backlash present and that its act. 

shaft. 18 " f ° ll0W8 < ™*» 

shaft a reverses airection ■ ( i. a whfln . n / 

U.e. when q - ) there i 8 a Bhor± 

— - * ^ s w h01d „ ~ ~" 

shaft ■ ^ &S MUUrfld from the , 

^ Xt 18 that the response of the 

lash i. *h SyStem ^ bac ^- 

lash is the same as the response would be if the 

lash and at the ti - "° ^ 

^ ^ ^ '™ <™sly Creasing - 
aoout to increase) we turn the e shaft B 

. w f 8haft " B l «ni in such a way 

8 ^ * — ^ing this turning. 

snarly at the nest reversal we L±ve . . 
mcre,ent Bj ke epin g J constant through th- 
in n.v, 6 8 Peri ° d 0f °acklash. 
In other words, the res onse i 8 that ^ 

that 01 a V-tea, without back- 
lash on which we impress as f 

& uxi T;io n a wave wnich is 

aoout as shown in F igure 9. 

- 22 - 

If the periods of backlash are comparatively short, the small 

connecting portions (actually quadratic polynomials in time) 

will have little effect on the response. That is, we can assume 

a square topped wave with little error in $ or q especially, due 

to the smoothing operation of the integrators (or, said another 

way, cue to the high impedance of the circuit to ; frequencies). 

How suppose that there is a certain amount of backlash 

in shaft B. The action of this is to cause the carriage of the 

upper integrator to remain stationary for a small period when 

q I 0. The same effect would be achieved if, at tnis time, we 
suddenly impressed on e a pulse wnich held the lower integrator 
at fero and kept changing e at sucn a rate as to keep the lower 
integrator there. lie keep the integrator at zero long enough so 
that its output \70uld have turned an amount equal to the backlash 
in B and then suddenly return it to its proper value, -his means 
that the area of the pulse must equal the backlash. The shape of 
this pulse would be a linear function of tine, but here again it 
is not highly significant. 

The entire system may thus be. replaced by one which is 
free of backlash and subject to a - driving function of the type 
shown in Figure 10, wnere B± is the backlash in A as measured 

23 - 

from e and Bg is the amount in B as measured from e (in the sense 
that if e covers an area B 2 , shaft B moves an amount equal to itB 
backlash) . 

It is easy to see from our diagram that this forcing 
function is in the correct phase to sustain the oscillation 
of decay. 

Tne fundamental component of this forcing function is 
easily lound. .Ye have 


Aj_ = y 6 s i n — t^. dt 



e may be split into a sum - one term for the square wave and 
oae for the pulse-like 3 2 part. The i^ 2 pulse is all concentrated 
near the center of the sine wave where it is nearly unity. Jfenoe 


A X - | 2 h. sin 2*t dt 4B2 

2 X r|» 

^ o 

= f-l 4 f o B 2 


The period T of this oscillation is the natural damped period 
of the system, to within a small error of size comparable to the 
length of tire during which backlash is effective. Hence itw 

- 24 

frequency is approximately 

t - i fi T 2 

and the magnitude of the fundamental component of the response q 

2£i 4 f B 2 

I . 

i R 2 (coqLd- i \ Z 
"o c 

Providing the quantity f!l 4 f o B 2 is 8111611 » the d *' 
flection mechanism will behave linearly about its equilibrium 

position and the above formulae would approximately hold. If 

|qj / the equilibrium value of inductance L would 


probably be as good as any to use since the differential inductance 

is greater on one side and less on the other. At 4 - the inductance 

is greater on each side and a somewhat higher value should be used, 

depending on 2B 1 4f B 2» If tne 8 y stem is more tnan critically 

damped, q may or may not have an inflection point depending on the 
initial conditions. If they are such that the driven shafts do 
not reverse backlash cannot take effect and there should be no 
oscillation. However, if they do reverse once, the system may 
receive the equivalent of a "kick" in such a direction as to 
cause another reversal and so on, so that oscillation is set up. 
ihis problem has not been very well decided but if this happens, 
the amplitude formula above should still hold, while the frequency 
formula will not. 

- 25 - 

The question of "spring backlash" i.e. undesired effects 

due to elastic deformations of shafts and mounting plates has been 
raised. Acoording to Hooke's Law the angular strain in a shaft 

is proportional to the applied torque. This torque in a shaft 

the first term wnose si^n is that of -x 1 , being due to a coulomb 
friction load, the second to a viscous friction load and the third 
an accelerating torque. 

It is clear that the coulomo friction term I, can be 
combined with tie ordinary gear type backlasn treated above, and 
acts, therefor s, like a periodic forcing function. The effect of 
the other terms is ^uit.; different, their presence causes small 
changes in the parameters and 6 of the circuit and also 

adds higher derivatives to the equation. Let us consider only the 
spring in the shafts feeding L q (i.e. assume q driven 

whose position is x(t) can probably be very well approximated by 

an equation of the form 

I = ±\ +■ 2g ac« t K 3 x" 

(Sq - P 1 q - P z q) 
(R 4 - f x q - i g «') 


- 26 - 

Sq + (R-Pi) q 

' F 2 - *1. 1 

- r 2 V = (e- « x i - a 2 e) - e X (t) 

Spring in the drive to q a similar effeot although 
complicated by the non-circular sine gears. 

If e is a linear function of t, so is e^ and the forcing 
function thus contains nothing to create a sustained oscillation. 
The left-hand side differs only by small quantities from the ideal 

Sq - Sq - _Ji__ q = e x 

, l-q> 

and will therefore surely approach the solution 

Thus we see that the "spring type" of backlash cannot cause sus- 
tained oscillation as the ;, gear" type of backlash can. However, 
if the gear type is present, the spring type can aid oscillation 
by reducing the damping, it may be necessary to overdamp in some 
cases in order to get an effective critical damping. 

It should be pointed out that the gear type of backlash 

may not be quite as simple as we have assumed, particularly in the 
L a 

shafts driving q 9 If the integrator carriage load is large 
aanpared to the friction loads in the adders and gears, then we 
are probably justified in assuming that gear pressures in the 
drive only reverse when the driven shaft reverses, however, if 

this is not the case, a backlash effect can easily take place at 
other times, for example -when one of the shafts feeding the adder 
reverses, without necessarily reversing the driven shaft \ 

The situation could become quite complicated, the equivalent input 
function containing several different sized steps occurring at 
different times, however, the fundamental frequency should Btill 
be approximately the natural damped frequency of the system, pro- 
viding the backlash effects are small and occur only during a small 
fraction of the time. 

The fact that backlash can cause a sustained oscillation 
leads to a cfitioism of the design of the mechanism, in particular 
to the metnod whereby the ercsin function is obtained. Note that 
reducing the amount of gear backlash 4f B2 will reduce the 

amplitude of oscillation proportionately, but apparently the only 
way to eliminate it completely is to at least critically damp 
the system for all equilibrium points, so that the shafts do not, 
in general, reverse direction. In the deflection mechanism as 
it stands, this would be distinctly disadvantageous, for if we 
critically damp at the maximum values of jijj, (the governing 
points) the system will be much over-damped near Q • 0, and in 
fact for most values of 4 due to tiie shape of the induct anoe 

Another related argument against the manner of getting 
the arcsin is that the repponse to high frequency error functions 
depends on the value of q. It seems to me that the treatment of 
error functions should be independent of thet ); arget speed - 

- 28 - 

what is best for one will be best for another - since the predictlo: 
error we can tolerate is an absolute quantity, not dependent on the 
target speed. There may be some objection to this argument on the 
groundi that at higher target speeds the error funotion is apt to 
be larger, and hence the circuit should have a larger impedance, 
but even so it would only be accidental if the peculiar variation 
introduced by the sinegear was anything like an approximation to 
the desired variation. 

Finally, a minor argument against the position of the 
sine gear is that the equation becomes so difficult to handle 
mathematically. A design of this type must be largely intuitive 
or experimental - there is not much chance of ohoosing the con- 
stants for the best operation by a mathematical formulation, or of 
determining to speed of response etc analytically. 

These difficulties might be avoided in several ways. The 
arcsin might, for example, be introduced as in Figure 11. 

No doubt the reason this was not done was because -with [ \{ near 
1, running the sin x gear backward is not mechanically practical, 
the gearing up ratio being too great. This objection could be 

- 29 - 

overcome in two ways - either a new gear K arcsin x to x (k large) 
could be used and the parameters R, L, 3 all decreased by a factor 
of k (or the integrator disks might be speeded up in suitable 
ratios), or, if this were not mechanically feasible, a rapid re- 
sponse servo mechanism could be introduced in the output, Figure 12. 

This system, can, by the way, be solved in closed analytic form 
when i is a constant, and reduced tc a quadrature in any case. 
The essential feature of this circuit is that the functions of 
rate finding and smoothing, and of taking the arcsin have oeen 
isolated. ,ach part can be designed to do its own job the best 
without comoromise. It may be noted that the arcsin circuit 
aoove also performs a smoothing operation which depends on target 
soeed. Sy suitable choice of the parameters we can make this 
larr;e or small fs T.-e desire. 
The ideal Hate Finder aaa Smoother 

Let us consider the problem of rate finding and smooth- 
ing from a general standoom^ and as* what mathematical opera- 
tion a macnine snould perform to act as zhe "best possible* rate 
finder. Cf course, rni s question has many answers, depending 
chiefly on what assumptions we make as to the input function, 


- 30 - 

and what mathematical limitations we put on the machine. Tile 
shall assume throughout that the input function e(t) consists of 
a series of linear parts with cunrea connecting portions and with 
a small superimposed error function, and that we only desire the 
rate during (that is, some time after the start of; a linear part. 
In this section we assume there ar; no limitations whatever on the 
machine - that we can build a machine tc perform any operations we 
can ascribe, in particular those a mathematician might use tc 
solve the problem. How there is considerable experimental and 
theoretical justification to the t -eory that the best way to fit 
a curve of a b iven type tc a set of points subject to an observa- 
tional error is in the least square sense. If we assume this tc 
be true in our case, and attempt tc fit e straight line to the 
last a seconds before tj of the curve e(tj, we must minimize the 


I s e - (At-B) 2 dt 

with respect to A and B. The quantity a represents the length of 
the curve used in the fitting process, ne would like to use as 
much of the curve as actually represents a linear segment to get the 
best accuracy, but certainly no more. A person doing the curve 
fitting could look at e(t) and see fairly well where the curve 
showed a real tendency to depart from linearity, and select accor- 
dingly. Mathematically it could be done as follows. Suppose the 

31 V 


standard deviation of the error is 6 and that errors of more than 
say 4cr are almost certainly due to a significant departure from 
linearity in the curve. We oould choose a such that it is as large 
as possible without making the error I e-(At'B) | (A, B chosen to 
minimize I) tj-a £r t ^ greater than 4<f. In other words we use 
as muoh of the curve as we can assume linear within observational 
errors. As a final refinement of the solution it might be desirable 
to include a weighting function W(a.t) in the integral I, weighting 
the more recent values more heavily. The final evaluation of the 
rate is then the value of A given when we minimise the funotion 

l(A,B.a) 8 re-(AttB) J 2 *(t,a) dt 

u t]_-a 

on A and B, a fixed, giving A and B as functions of a, and then 
cnoose a as large as possible with 

| e - (At+B)| ± K C t x - aftf 

This solution can be put into a more explicit form, 
but even wnen greatly simplified it appears that it would be quite 
difficult to carry out the calculations accurately by meohanioal 
means. The main difficulty is that apparently such a machine must 
be caoable of remembering exactly the past history of an arbitrary 
function, e or something derived from it. The only methods I know 
Of doing this are quite inaccurate, or else very complex, and it 
seems likely that ^he gain in mathematical precision of the above 


- 32 - 

formulation -would be more than offset by a loss in mechanical pre- 

Differential Analyzer Types of Machines 

Tc become a bit more practical, let us now confine our 
attention to machines of what, might be called the differential 
analyzer type. 3y this, vre mean machines constructed of a finite 
combination of adders, integrators, and function elements (e.g. 
non-circular gears). Two shafts e(t> and kt enter the machine 


ana ore shaft u(t) leave b the macnine. It can be shown that any 
such system must satisfy a dif f erect ial equation of the type 

. • (n) 
*(q.q ... q ,t) = e(t) 


u(t) a q U) . 

First, we ask what can bo said about the form of this equation to 
maJce the machine act as a satisfactory rate finder in our sense. 

1. ..ith the same initial conditions and the same e(t) the 
macnine snoula certainly resDond the same independent of 
the Time of start, hence f does not depend on t. 

2. .lien e = At B the equation must have an equilibrium solution 

q^ ^ ■ A q(* ^) = o 


q = At e • 

t i 

i i 


- 33 - 

If i>l, the carriage of an integrator will be continuously moving 
in the equilibrium condition. This does not seem practical for the 
initial conditions may be anything depending on past history, and 
the integrator would surely go off scale in many cases. Obviously 
from the equilibrium solution, i is uot G, for this would icply a 
constant equal to a linear function of time. Hence i = 1 and 
q' = u(t). 

3. Let 

f U.y) s f (x,y,0, ... 0) 
jue to the equilibrium solution 

f (At -i- C, A) = At - 3 

for all k t J, t. 

it - jH*.y) A - A 

it j s. 

f (x,y) = X + h (y) 

" tit 

4. Assuming f is fairly "well behaved", we have near q » q = ... 
■ q(n) ■ p (i.e. near equilibrium) 

f ■ f (q, q, 0, C, ... , ) 

q *q ^ w 

■ q h (q) * a 2 q^ ... % q 

34 - 

and the differential operation depends on the coefficients 
&2 ••• a^and h (q). As this differential operation should not 
depend on t, the a^^ must be indepencent of q, for in equilibrium 
q cnanges with t. Ihey may aepend on \ however in which case the 
differential operation depends on the target speed, which may or 
may not be desirable. In the deflection mechanism this is the 
case, ag ■ 1 


5. iith q near a the above reduces to 

f • q f q — a 2 q— ... — a_ q( fl )-~ b 
where a^ ■ h» (a) and b - h(A}-Ah'(A). To eliminate backlash os- 
cillation the roots cf this equation should all be real and for 
stability all should be negative, for all desired A. 

6. For complete stabil ty, there are no doubt further requirements 
on the. form cf f. This problem, however, is still unsolved. 

The above are only requirements on the form of f so that 
it actually does find a satisfactory rate. To find the best form 
of f would roquire u. very elaborate mathematical analysis if possible 
at all. ■ 

If we restrict our machine still further and assume a 
linear differential equation with cons-cant coefficients, it is 
possible to ^ive a fairly rational analysis leading to the best 
values of the coefficients. The question is this. Given the 

- 35 - 

» q *i q' ••• » n q (n) ■ e 

What values of the coefficients a ... a^ give the best rate- 
finding smoothing properties? From what we said above, it seems 
that the characteristic equation 

-> *n P 

should have only real negative roots and that the rate found will 
be q'. We may normalize the equation by assuming a ■ 1 so that 
q* is actually the rate and not merely proportional to it. In 
the Heaviside symbolio notation, we have 

q' = 

-V(V 1) 

writing the polynomial in the factored form. The b^ are positive 
real numbers and are the time constants in the transient part of 
the response. We assume the b, arranged in increasing magnitude. 

Let us frsae the problem as follows. Keeping the speed 
of response of the circuit the same, what values of the b give 
the best attenuation of the error function. Of course, the trouble 
appears in trying tc decide what we mean by keeping the speed of 
response the same, ^'ne answer is that we keep the maximum time 
constant, that is t_. the same. This may be partially justified 
on the following grc«ndsi 1. For "almost all" initial conditions, 
the term A e"-~ will eventually dominate the transient response, 


- oo 

the other terms becoming arbitrarily small in comparison. The 
only time when this fails is when the coefficient happens to 
come out zero. 

2. In the worst cases (other coefficients small in comparison) 
the b n term dominates for all t, and the machine should perhaps be 
designed with the worst conditions as governing. 

3. If we use this criterion, it is easy to show that for best at- 
tenuation of error frequencies all the b^ should be equal. For 
the magnitude of the transfer admittance (e to q*) is 

= li 

2 2, 
V (1- b k uj ) 

which is obviously smallest when each b k is made as large as 
possible, for all frequencies. That is, each b^ ■ b n the maximum. 

Another way the "same speed of response" might be in- 
terpreted is in terms of the expected area under the transient 
time curve. Keeping the standard deviation of this area con- 
stant seems to give the same evaluation of the b k as above but 
there are certain statistical assumptions in my proof that may 
render it invalid. 

If the characteristic equation has real roots, it may 
be set up nicely as in Figure 13. 

This circuit appears to have an advantage from the backlash 
point of view over the more owvious one shown in Figure 14. 

S 7 3s 

, ^ver that the use of nonlinear equation. 
It seems quite possible, however. 

+ot r« Consider the equation 
could offer a real advantage. 

S(q) q + Kfl> 4 S * 

• *. are functions of When the system 
where the three coefficxent. ere fu 

< + acts approximately likex 
i. at acts a. p 

3(0) q 4- K0) q' - « " * 

be adlusted to give critical aamp- 
^ these three constat, could beadj 

Man of the error function frequencies. On 
ing and a good attenuatxon of tw 

* at or near equilibrium, q. is 
the other hand, when we are not at or 

ki different from, tero. The values of the 
(usually) considerably dxfferen* 

(usually; w to g . ve a very 

three coefficients could be adjust 

, thuB .pproaoh the equilibrium posxtion faster, 
rapid response, and thus appro 

, v^ver that there is some fundamental error xn 
It is possible, however, tnax 

"w * .« attempt to do this would 
- *„* for example, that an attempt w 
this reasonxng, ror exwny 

necessarily cause oscillation. 
r irr J -» j^SSS: ^cuits. 

^T^T- — ... — - — - - 




Claude J2. Shannon 


The so hematic diagram of a new type of height data 
smoothing me onanism Is shown In /igure 1. The discontinuous 
height data e(t) Is fed into the input shaft at intervals. 
This drives a differential, oonneoted also to the ball car- 
riage and roller of an Integrator whose disk is turned by a 
constant speed motor. A correcting hand wheel and the inte- 
grator roller feed another differential whose output is the 
output of the device. The output and input of the machine are 
compared through a differential feeding dial. The operator 
is supposed to turn the handwheel In suoh a way that the posi- 
tive and negative oscillations of the dial about zero are 

The actual height of the target h(t) is a continuous 
function of time and we may assume that Just after each read- 
ing e(t) is an approximation to this* Thus h(t) and e(t) might 
be as shown in Figure 2. 

The shaft y(t) clearly satisfies the equation 

(1) 7 ♦ £ 7* • «(t) . 
The z shaft satisfies 

(2) x(tJ - yit) ♦ olt) 

and the dial roads 

(3) D(t) - e(t) - xUi . 

During the period between height readings the position of the 
alt) shaft is constant, aay sit^), the reading TiaJcen at t a , 

y *; y - 9 <V 

/ * » -a( t - 1_ ) <. 

y - ett^ + ^ e * t n - t v t n + x 

Since y is obviously continuous, it will follow a curve con- 
sisting of a series of connected exponentials, each with the 
same tine constant, 1 • The continuity of the ourre implies 

- ^n 9 " * e < V • 

assuming the intervals between readings the same, aay a seconds, 
the response y for two different time constants m^a - In 2 and 
aua « In 10 are snovm in Jlgure 3. 

Hie larger the time constant, the acre the lag in 
response of y(t), but the smoother the curve, Jhis may be 
aeon another way: the o to y system is equivalent to an 3, 
L circuit with position of 3hafts analogous to voltage as shown 

In ifigure 4. with M small y follows e closely including the 


irregularities, ./lth <g large y(t) is smooth compared to e but 
lags considerably. 

Movement of the hand wheel does not affeot y(t) but 
shifts zltj up or down with respect to y. If the operator 
turns the uheel to give equal positive and negative movements 
of the dial, it may be seen that in the "steady state" (say 
with f(t) - at) there is a constant lag even when the damping 
is low and the interpolation nearly linear. In this case the 
system bridges linearly between the raid-ordinates of the steps, 
while actually it should bridge between the points ( t n ♦ 0}. 
<ith higher damping the shape becomes worse but the interpolated 
exponentials are nearer to the true curve most of the time. *e 
3hall find a formula for the best time constant of the system 
under the following assumptions 

1. That the "best" time constant is the one making the 
actual error least in the mean square sense. 

2. That we may take as the true curve, so far as our 
knowledge goes, the linear Interpolation between 
the points t Q + 0. This may be justified by the 
faot that the device cannot in any way perform 
higher order interpolation - the curve y(t) is con- 
vex upward whenever e(t) inoreased in its last step 
over the final value of y from the preceding step, 
and this is quite independent of the curvature of 

3. That the system is In a "steady state", that is, 
that in the step under consideration y(t) ends at 
the aajaa distance below e(t) as it was Just before 
the step. 

4. riiat the steps come at approximately equal inter- 
vals or a seconds. 

An interval under these conditions is shown in 
Figure 5. Here we assumed that the hand wheel was turned to 
give a ratio of -2_ as deflection of the dial just after to 
just before a step. 

.v'e have 


y - A e 


ylo) - b - y(a) 
A - b • a e" 


1 - e 

b a~ mt 

7 " 



s - y - y(o) +c 

- 1 - <3" BA 

- o — s— + c 



The Integral of the squared error per second is then 

-2 1 

- b 

i -mt . 
1 - e _aa a 


- 8 - 

k u 2 SJL- in * i e-^ ! 

1 - e 

- 2 

1 - e- D L 2 


k 2 ♦ 

3 u^r s(1 - ,+ -t^j 

+ k - 

3 k L 
1 - ^ 

1 - e~ D ) 

D ) 

l- -D [2 (D d£) 


& ♦* ♦ 2 ♦ i (2 ♦ 4k) * D ♦ 3 + 5e ' D 

13 } 2 ^ ll--D)2 20 (1 . e - D) 

It i3 evident from physical considerations that the minima of 
this expression ooours fop a fairly large D. In faot the error 
ourve was plotted for k - .5 (Figure 6) and the alnUBaa ia seen 
to be at about 7 or 8. ,<ith D this large the abOTe expres- 
sion ia very nearly equal to 

- 7 - 

sinoe e" D is very small. To locate the minimum we have 

2* - jL - 2D (2 + 3k ) - 2 f ( 2 ♦ 4k ) 3 + 3] . Q 
D 2 D 3 4 D 2 

16 - 8k) D - 16 



3 - 4k 

7or k - •* 

D - 8 

Since the m**Hw«» is so flat (Figure 6) this formula is cer- 
tainly close enough. However a second approximation may he 

found as follows: for x small — - — - 1 + x. Using this in 

1 - x 

the exaot expression to eliminate the denominators we get as a 
second approximation 

2e' D ) 

- tl*k) U + e" D ) - J5 llW D ) - ± (l*e- 3 ) e" 3 


- a - 

£5 - « - 8 ♦ (3- 4k) D + [6D (D*l) * 2D 3 lk-1)] e~ D + 6D (D+l) 

Using the first approximation to obtain the value s involving 
exponentials, a better value may be obtained. Jor k - | the 
second approximation ia D - 8.03. The first and second approxi- 
mations are plotted in Figure 7. 

tfith k - -| the ourve x<t) is plotted for an interval 
with the "best" D, in Figure 8. It will be noted that the 
ourve is highly damped in comparison to the time between read- 
ings. The HIE error is then equal to 

It is interesting to oompare this with the HIE errors obtained 
under other conditions. If the devise is not used at all, but 
a direct coupling made between the input and output, the HIE 
error between the step function and the linear interpolation 
between points tjj + is 

(I) 2 . 1 
CS) a 

t 2 

[0 - (- ^) ] dt 

I m 1 m .577 
b " y-sr " ' a 

so that the RLE error has been reduced to 40$ of this value. 

In Figure 9, the output of the smoothing mechanism, 
x(t), is plotted for a certain forcing function e(t), using 
the "best" value of m. It may appear that the output 1b still 
far from 3000th, and this is in a sense true, but it must be 
remembered that the variations in e(t) are here greatly ex- 
aggerated over what would be expected in practice. 

Finally it should be pointed out that a very mater- 
ial improvement in operation could be obtained if the opera- 
tor were trained to turn the handwneel to obtain a ratio 2 


nearer to zero than This, however, would probably be im- 




< f » 


C08R iCTl^O- 



F.* t 2. 

H I nmOM 




Claude E. Shannon 

June 26, 1941 

Some Experimental Results on the Deflection Mechanism 

In a previous report, "A Study of the Deflection Mechanism and Some 
Results on Rate Finders," a mathematical study mis made of a new type of 
defleotion mechanism. The present paper is a further study of this de- 
rice and a report on same experimental results obtained on the M.I.T. 
differential analyser. 

For oonvenienoe in reference, the schematic diagram of the machine 
is repeated in Fig. 1. In the report mentioned, the utility of the 
middle part of the device -was questioned. This arose from a misunder- 
standing of the basic assumptions underlying the design and was oleared 
up in a conference with Dr. Tappert. The writer's analysis was under 
the assumption that the mechanism was designed to find rates for linear 
forcing functions only (i.e., that higher order terms were small by com- 
parison) , and the analysis is still valid if this is true. However, in 
practice, it appears necessary to assume higher order forcing functions 
and the deflection mechanism is designed to give the oorreot steady state 
rate (exoept for the non-linearity of the sine gear) for an arbitrary 
quadratio foroing function. Actually' the middle part (often referred to 
hereafter as the "x" part) of the devioe is certainly well worth while, 
as will be seen from some of our experimental curves. 

If a linear mechanism has a transfer admittance T(ja) from input 
e(t) to output 4(t) then 

J" Q(J«>) - T(»E(juj) 
where E and Q are the transforms of e and q. It is easily seen from 
transform theory that if e(t) » at ♦ b, a necessary and sufficient condi- 
tion that 4(t)->a a8 t-^>- is that 

ǥ>-ȣ jo 

If this condition is satisfied the system may be called a first order 
rate finder — after the transient has died out, the output is the deriva- 
tive of the input whenever latter is linear. Similarly if 


T(O) - Y'(O) - j T(0) - k - 2, 5, ... , n 

we have an nth order rata finder — in the steady state it finds the rate 
of an nth degree polynomial forcing function. In the deflection mechanism 
we have a second order rate finder 


- + e^w 3 + CgW* ♦ ... 
if we assume / ■ nearly 1. A oircuit for solving 

A ♦ 4 2 

i - sin" 1 4 

under the same approximation, to the nth order is shown in Fig. 2. The 
admittance here is approximately 

1 # a 1 (» ♦ a 2 (» 2 ♦ ... + Vl (j<u)n+1 ^ 
the values of the constants in the mechanism are 

1 » 4.63 J"» 

y(» x S ** oa r * J" 

1 ♦ 4.63 5.73 (j-r ♦ 1.094 (» S 

_ (1 ♦ 4.63 .1«Qj«rf 

In the previous report it was pointed out that due to a clutch and 
stop on the input to the sine gear values of q" -were limited to two hori- 
zontal lines (see Pig. 6 in that report). There is also a olutoh and 
stop on the displacement of the lower integrator. This effectively fur- 
ther limits solutions to a parallelogram ai shown in Pig. 3. Actually 
the limitation is fictitious — the q shaft oan turn an unlimited amount, 
but when this stop is in effect the stability point moves at such a speed 
as to be equivalent to q and \ moving along one side of the parallelogram. 
Thus if we keep the stable point stationary paths of representative solu- 
tions will be as indioated in Pig. 3. 

The trial solutions taken on the differential analyser may be classi- 
fied as follows « 


I. Solutions taken -with the mechanism as designed. 

A. 8imple analytic forcing functions. 

1. e(t) - a 

2. e(t) ■ at t b 

3. e(t) » at ♦ Vt ♦ o 

4. e(t) - at 3 + fct 2 + ot ♦ d 

B. Response for 8 -typical target courses, the target vector 
Telocity constant. 

C. The response to some error functions superposed on typical 

D. An attempt to get backlash oscillation. 

II. Approximately the come program although less extensively with the 
middle part eliminated* 
III. A few runs with typioal courses using three different third order 
rate finders. 

The constants of the target courses used nere as follows (see Fig. 4) i 
Course I S - 150 yds/seo » 507 mi/hr 


7 « 2,000 yds 
h^ - 1,000 yds 

$ m 0° 

Course II 8 • 150 yds/seo 

2,000 yd. 
h^ - 500 yds 

* "0 

Course III 8 - 150 yds/seo 

V - 4,000 yds 
h a • 1,000 yds 

• - 


Course IT 

S - 150 

V - 2,000 

h - 2,000 


Course Y 

Course VI 

S - 150 

V - 4,000 


h - 4,000 

9 - - 14.96° 

V - 4,000 - 40 t 

S„ - 150 

V - 2,000 

h - 1-000 


* - - 14.96° 

V - 2,000 - 40 t 

Course VII 

B - 96.6 


V - 3,000 

h n - 1.000 
6 - - 60° 
V - 3,000 - 115 t 

Course VIII 8-150 

V - 4,000 
h m - 500 
• • 

The distribution of these courses is indicated in Fig. 5, together 
with the approximate maximum range of the 3 B A. A, gun (21 sec. fuse setting). 

The actual input to the deflection meohanism is 

r* s h t 

a o p 

but since it was desired to compare the actual output with the true 

sin" 1 i 

the quantity e was plotted against t and integrated to provide the input. 
To calculate I the following method was found to be the simplest. We have 

8 h t 

' --P **- 

o p 

A computation schedule was set up based on this formula, working baok- 

wards from the time of burst t + t to the present time 




t ♦ t h V 

P P p 

" h/l*£8 g (t*t p )J 2 - yi- (ft p )S g tan *] 

* p t / 78— IT 

from - I - TV 


The ballistic data used in getting t (IV) was read from the chart 

Fig, 24 Opposite p. 59), Coast Artillery Field Manual, FM 4-110. The 
value of t p was merely read off corresponding to the computed values of 

r and h . 

P P 

If we assume as an approximation that the shell velocity is oonstant, 
k yds/seo (i.e., that the equi-time of flight curves in the ohart are 
circles) so that with V constant 

, 2.2 .2 „2 
k t « h + V 

P P 

h - h + S (t+t ) ' 
p m g v p' 

p m 

h/h" ♦ S t 2 

we oan eliminate t p and h p from the system to obtain the following equation 

between e and tt 


e 2 [k 2 (h m *S g t) 2 (h^ 2 )- (h 2 *S^)V 2 S 2 ] 

+ *[2 vsWhfVTt 2 ] - C^5T 2 *TT 2 (h *ts ) 2 ] - o 

g m n g ' 1 g m g m* m g' J 

Evidently the same curve a (t) is obtained if h and S are both multi- 

o m g 

plied by the same constant. 

The differential analyeer set-up used is shown in Pig. 6. An attempt 
was made to generate the sine function with two integrators solving 

but this was found impractical because of the large integrator loading 
necessary, and an input table was used instead. Even in this case it was 
necessary to use a very large scale factor on the independent variable 
shaft due to the small integrating factors (l/S2) of the differential 
analyzer as nompared to the ball type (about 1 under comparable condi- 
tions). ,This resulted in solutions which represented, actually, 30 sec- 
onds requiring 30 minutes of maohine time. 

The equations of the deflection mechanism are 

9 i * .54 x - .54 | 

♦ 4.700 q ♦ 1.692 q - 1.692 e ♦ 4.700 x 

1 1-4 

It was neoessary to approximate the ooeffioients with available gear 
ratios on the differential analyrer. Fortunately some very close approxi- 
mations were found. The equations actually set on the machine were 



* ♦ .54 :X - .54 i 
♦ 4.706 $ ♦ 1.694 q - 1.694 e + 4.706 x 

The error is of the sane order as the expected machine error. 

Except for runs In group ID the. machine was made as "tight" as pos- 
sible, the backlash being corrected by frontlash units. Due to the large 
scale factors used and the high inherent precision of the integrators used 
in the differential analyeer, the rune ray be expected to be more accurate 
than the actual deflection mechanism. 

Solutions were taken in the form of both curves and counter readings. 
The ourves given here -were reproduced by pantograph to ordinary graph 
paper size. Curves not directly drawn by the machine and numerioal values 
quoted are taken from the counter printings, which give an additional 
decimal plaoe not readable from the ourves. 

Discussion of Runs 

Host of the curves are given with 4 as dependent variable. To esti- 
mate the error in yards for a given error in q from e, the ohart of Fig, 6A 
may be used. This is computed from the approximate formula 

r cos t IS 

. r ££L* Aq - r A(e,q) Aq 

For rough comparisons the coefficient A may be taken as 1, the error then 
being the 4 error multiplied by the predicted range. 

The first set of runs taken were with a sudden impulse e - kl with 
the system at rest, both with and without the middle part of the meohanism. 
Runs were taken with 

k - 0.1, 0.2, 0.4, 1.0, 2.0 

Typloal curves are shown in Figs. 7 and 8. The results are very close to 
computed ourves on the assumption that l/f/l*^ ■ 1 when k < .4, but above 
this the non-linearity becomes appreciable. In the worst cases the 
sient disappeared to within machine errors in 25 seconds, and for most 
oases within 8 to 12 seconds. The action with the middle part out was 


considerably more rapid than -with it in, the transient being 6 tines as 
great, as had been predicted, this being a special case of a linear 
forcing function. Pig. 9 is a -lot of the time required for the transient 
in 4 to reduce to 2/10 of its maximum value. For values of k greater 
than about .35 the curves cross the axis once with the middle part in. 
The curves with it out are all" identical with k > 2, due to the action 
of the slip clutch on one integrator. 


Next a series of runs were taken 

e - ktl(t) 

starting from rest, with 

sin""T: - steady state S - 15°, 30°, 45°, 60°, 75°, 60. G° 

the last being the limit of the sine gear, the maximum possible deflection. 
These runs are shown in Figs. 10 and 11. The transient died out in all 
cases within 20 seconds except with x in for S > 75° in which oases 30 
seoonds or more was required, due to the action of the slip clutch. These 
long transients, however, would probably not be troublesome since such 
large deflections would only ocour in practice with the plane almost di- 
rectly overhead. For the smaller values the response is about equally 
rapid with x in or out. 

Quadratl o Forolng Functions 
— — — — 1 

The runs with a quadratic forcing function 

e - at 2 

were the first to show the superiority of the mechanism with x in. Runs 
were taken with 

a - .01, .02, .03, .04, .10 

With a quadratic rate finder the solution q" should approach 2 at, and with 
x in this was very nearly true, the discrepancy being due to the sine gear. 
8ome solutions are shown in Figs. 12, 13, and 14. The errors increase with 
a and with \. The maximum slope found in air/ of the I courses plotted is 
about equivalent to an a of .05 so that the large errors due to the sine- 
gear with a - .10 need not cause great concern. 


Cubio Forcing Functl ong 

For oubic forcing functions the following were used 

• ± - -.04 t 3 ♦ .1 t 2 

e 2 - -.001 t 3 ♦ ,05 t 2 

e 3 - -.0002 t 3 ♦ .02 t 2 

.These -were chosen as having second order tangenoy at t - so that the 
transient is small. The results are shown in Figs. 15 and 16. The re- 
sponse with e 2 and especially e 3 are very olose to the calculated values 
on assuming the equation linear. The error in e^ is somewhat greater as 
in the quadratic case with higher acceleration. 

Effect of Backlash 
— — — — ' 

A number of runs were made to determine the effect of backlash using 
several different foroing functions. In order to inorease the amount of 
backlash, frontlash units were inserted at several oritioal points in the 
baokwards direction. The results of these runs were, however, oompletely 
negative, for no oscillation of any sort was discovered. The system was 
given "shocks" by sudden turning of the e shaft and other methods, but the 
solutions were oompletely stable The only results were small consistent 
errors, of the order of magnitude of the backlash. It is possible that 
due to the large soale factors used in the set up, even the artifiofelly 
introduced baoklash was not sufficient to oause the oseillatlon effect. 

Response for Typical Courses 

The response for the 8 oourses described above are shown in Figs. 17 
to 24. It may be noted that even on the flat oourses (e.g., IV) the opera- 
tion is poor without x. On the flat oourses the response is satisfactory 
with x, the error being less than 20 yards except sometimes at the hump in 
e. However for the steeper courses errors of 60 or more yards are common 
after the start of the peak which do not disappear until nearly the end of 
the oourse. The action is particularly bad coming down the hump. Fig. 25 
is a plot of the error in yards with oourse VIII, x in. 


Response to Error Functions 

In Pigs. 26 - 28 are shown the responses to some random error func- 
tions of various kinds superimposed on courses I and II. The operation 
in damping out the error is considerably better with x out. However it 
seems from a consideration of the size of the errors introduced and the 
responses found that the system, even with x in, damps the errors more 
than necessary. That is, it might be preferable to increase the speed of 
response so as to reduce the transient errors in the solutions. 

Pigs. 29 and 30 show the responses when we suddenly start tracking a 
target in courses I or II with the machine previously at rest, with the 
target at several points along the course. 

Tests with Different Equations 

Three runs were made on course VIII, the most difficult one of the : 
group, using three different cubic rate finding equations. The equations 
used were (assuming linearity) critically damped, with the transfer 
admittance st 

[i ♦ 2(>)r 


(2) 4 . 1 * 4(j«fr ♦ 6(J.) 

[i ♦ (J-)] 4 

The results of these runs are shown in Pigs. 31, 32, and 33 and 
should be compared with Pig. 24. Of oourse, this gain is accompanied with . 
a loss in error function damping. With the^roots equal to 2 the system 
had a slight tendency to be unstable on the flat part of the oourse. This 
however appeared to be due to the "human backlash" in the operator on the 
sine table and would probably not be present with a sine gear. 

It is easily seen that an increase in the values of the characteristic 
roots of the equation demands a proportional increase in the power require- 
ments of the integrators. It may be that this will be a design limit in 
the case of meohanioal systems. Ho difficulty would be experienced here 
however with electrical integrators. 


The main conclusions of this work are as follows: 

1. The middle part of the machine is definitely worth while. 
Although it increases response for accidental following errors, the gain 
in behavior for actual courses more than offsets this disadvantage. 

2. The system behaves nearly enough like the linear system 

1.094 "q ♦ 5.73 q ♦ 4.63 q ♦ q - 4.63 I * 4.63 e 

to within a few per cent, 
ction of 37°, the approxi- 

that this may be used to calculate its 
providing q < .6. As this corresponds to a 
mation is sufficient for most eases. 

3. For targets whose elevation at their nearest point is greater than 
about 50° fairly large errors occur due to substantial cubic and higher 
degree terms in e. This indioates that it might be worth while to use a 
higher order rate finder. Tests made with a oubio rate finder showed 
greatly improved results. 

4. If the additional cost of another integrator and adder required 
for cubic rate finding i B too great to be Justified it appears that the 
system oould be improved by reduoing the time constants, for if sufficient 
power is available from the integrators, the only disadvantage would be 
increased response to random error functions and our results indioate that 
they are now damped out more than neoessary. 

5. There is some indioation that better results would be obtained 
by making the three time constants equal, or more nearly equal than they 
are now, although this is not certain. 


mr— < m u m, t mmm l-.-jgni — 

inS^^B^^ESS — — %5S55 immmm tw 






^H^^^ igOiffililllfin imlUlIl iOtliiinflmiiiii iioio|i| Illy gnl gm^ 


•IZI !!*••&•»■« 

■IM ««••■■■••■ ••««■••••• •■•■■•••«* 

■apt •«»••■■•■• aMsavaaas 
mmmt Imu Man Miii mMini 

iaaf »fj»8 ■ ■ 

— -■■■■»« 


iftai iMNMIitMin 


aa uuiiiiii 












MiZa 55555 iitH am M"j 
■ESS ScSS Bwn mvm nvuvv 

toHBS Sasui :::::::=: 2K:r 






liiiHtan.!*' ■ tmmmf »«««»»««»»»«««« »«»» » lllli HIS ■«»» ■»■»« *** Sii f? = T = -— 

i--^— :rt~;:: 

••■"■■•••■■•B.BBII.IIIIBB.II. ■■■■■■■■■■•I* Jl I • • ■ . 

::::: ■■■ 

!!!! ai1111 Iaaai 1 Hiaa>l »■•!•■■■* ■■■■■ " hi "!! ! 


.IB. ..III! ai'BIBII 

BBBaiBIBBIII ■■■■** 

■■■■■■■■■■■■■■■■■a it ■■■■<- - 
■*■■■■■■ wmw* 

••••• urn • •••2222222 21222 222*. 2222! 22".. 

■ ■»" bbim Miiaiiaisami. ■■■■■■■■■■ ■■■■■■■«■■ 

«•«■•••«• riiniiiifMiiniMiii *iimiim« (I imimimmSm!!!m 


: ::::::::::::::::::::::::i: 

■■■■■ ■■■■■■■■■■ ama ■ ■•■« ■ 

• •■•-■■■> awi aauiiMMiMaa ibm. .£2 ZZ 22222 

bbiibbbbm ibbbb um imi mn ■■■■■■■■■■ •■■«..■■■..«■.. .... ■ 

•••.■•••••••■•.•■.••.•••■.•••.a.... ■•••■•■••« ■....«■!■• !•»•••■!•* 

iiiiiiiniii um miiiiiiiimi iMtiiiiiiifniiiiifMiiiiii 

lUMluniMttilu ••■iiiiiMtiiinnimni ... 

■■■■■■■■■■iMMiiiuiimiHiiiuii .......... 

■»■■■ iiniitMiiniiiwiiM mMH immmiimimiiHiMmnnmiiin 
...ii ■.. ■■■»..•■•• mi inn in 

Mill ■■■■■ M1M ■.M.^.-.W W _ ^lOTHMIUaa. •••■■■»■• BUM ■■•■■■■•■■..«.. 

■•••■•••■1 III 

'■■■■■■■■■■■■■•.I ... 

■*;■?«■■•■■■•» --«■■■ 

■ .in...... bk mmm 



• ■■■■t ■■■■*■■ BBBasiiiL-t 


'\:::::::::::::u:::::::::::i::::^:::::::::::::::::::::::::::::: :::::: 

..i**;;" -•»••»•■■•••»■•■•••• •••••• 

■■ miiin 

222222222! !22*i~ r *** "t- ..»•».■•.•■••..■•.•• 
II2I! II22! ?*;■! f^£i. ■■•■■■■■•■•■•■■••■■■ *«b •* . • ■■• ■.*•..■•«■ 



■ .BBSS 

■■■■■ •■■■■■■■■a SSSSSSi 

22222:22222222 s ■••■•■■•"•^^•'^■« •■-■«■••■■■■■••••••■ 

2222! 22222 2222! 12222 252" Ik^-lkMIIUIIIIIIIIIIIIIiiiuUIIIIIIIIHIIIill 

2222! 222222222! 22222 !22±f i-iisis^*"* -«m..i»..... ....... 

!222!!!22!!222!2222!2222! £2222 222^* --•*»*■ ■•■«■••■■. 

■ ■•«■•.. ...a... -is. ii 

2222 2222222222 2222222222 !222!2M2!*^* : 

■■■■■u inrni ■■■■ 
bbm bbbbb ... bbbbb .ihii ami .. ... .... . .u 


HiiiiHiiiiiimii iitttiiiiu iiiiimitiia ii iiiiiii 

iMiiiiii •■■■•aaaaa aaiiaii 

■■■ ■ BIIBB milMHI 1.M1 ■■■ ■ ■ IIIIB milHII|«M|||| 

■yi ■■■■■ itwi mw ■■■■■ ■■■■■■■■■■ ■■iiiiwi ■ win 

■■■imii iiihiiiiii»iiiiiii ■■■■■•■•■i ainaai- 
■Mf IHII ■■■ ■■■■■■■ Hniiiin ■■■■■ mil ■■■■■■■■■■ immiii 
■■■•■••••■■•■••■■••■•■■•■■MtlmiNMI ■«•■..«.■. 

IZ1I22 222222222! ••■■»•■•■■•■■■•■• 

222? 222SS2222! 22*2* ^■■■■^■■■■■■•■■■■■■■•■■■■■■■■■••.>aB»aaaaiiiaaaa.a... 

222S 22222 22222 22222 ^2 * .■»■. n. umii— 

■ BUM I 


• ••Si aaaiaasaa. miiitiii 

2222 2222222222 22222 25222 22222 SlSnSSSSS IZZZZ ZVZ* ZZZmZZmZZZ 


lljMIIIIII MUlllllllttMMI 

■■■■■■■■■a ■■■■■ aa 

8 B858 g— ■ ■■■■■mwWMHiii— ■■■■■■ ■■■iiiiaia 

N flllMIIIIIIIMIMMIIIU Mlllllllil m 

miiim 2*J22 22 212 .2222 22222" 2 

! 222222 2222222222 22222222 i !**"*" w * 

2222 22222 222222222222222 222222222! • 

; a ; i !»;*;!! a *!;? !M'g! f!jiM*»»i*MiiiittitiiitM»*iiiiitiiiMM«iiMMii» 

222! ? ' " *" aa i "V a a *" a *** l * MW, m ' l> ' > l iii MH in— ni ia ii 

■ BvaaaaBiaa aaaieaaaa 

■ aaama mim t . mmu 


mm ■■■■■■■■■■ tun 

■ WlMBil 


Hiiiuimi iiNiiiaiiiiu ■■««■■■■»■ ESaaaBaaaiiiaiiii 
iiiuh mn inii Hiniiui inmiHi ■■■■■■■■■■■■■■■t 



■■in inn aaaai mhjiiiiim .......... .. 

• aaa miiiiiHiiini. 

■ ■■■•■■■■■■■■I IIIHU 


■■■a ■■■■■■■■■a Bin 
■■■a ihii bbim iibj 

■ ■a. iiiiinmiimiaMB mmm\ 
miiHin ■■■ 

■•■Niiinmu mniiMi 
•«•■■ aia.iaaava iiiibmim a .bib 

*■■*■•«■■■ ■■■■■•■»l ■•■■IIMHiaHIIIIIII Hilllllll 

■ibbb uaifl »■»•■ ■ a. umimiiiiimni imiiiiiii 

■mm aaan mm mmm ■■■■■ 
bb aaa ■ aaa ■ ai IM ima 

iMiiniiiiiHimiinniimmiiiiiiiiimmiiuiu ■■■■iimhhimihii 

2222 222222222! :22K22*"*2"""**"*""*"""*"**"^ 

222! 12222222m 22222 22212 221222222222222 2222221222 2222222S22 **;'****;;; 

hi m 

I Embbmm (SiiiHiiiMHiiinj 





sh s=s nca sr ■• rr: xsa ssisn rrsrrs rr: b? r brsrrs •?•■? ■•:■! am k:k tnsRSSiaRSBif 


rrs nss rt^ r r: xsa rrs » rr rrs rrj rr: Br: £r::::rr:::r:r:r::r::: 
rr: ass b^j r= rr: rr: rrrrr r:r :rr R«. • • . « .«•• •••** ■•«• krcrrj 




















■fiuilliiiiiiuiflH«iiia»ift*fliMM*ai*w« I 
■ ■iHiiiiiiiifimMiffumiiiuuMtMiM I 

B"~" !HB M, H M '!l MW * >M *" l "t >w "Mwi : 
"!! l !''*!"* ,,> "!! mui * > * , '!*MP ia > I 
■uiatMBMUiflisiMUiiMMiiBMaMiaai I 

— MM**! 



_ IUIIUI.. . 


::::: :k:sj: 



■ »*■■«*■■ mm a asm ■ 
■ Himwii >Maa» 





|i«ntifiiiii> r .■•■■■■•■■■■■■■■■■••t ■»•■•**»•• 

I »y MM iiiiniini»iiu 
I ■miuiuiiir|»VHU mUHmuiai aaaaM 

::::::::::::t»::w.:: :::::::E:::::u:Rn:::::nn:K:::us:i 
I r ::::::::::::::;::::!::::::;:::::::; :uk: 


■ M»*« ••■>•■ 


=7 — -^aVBltflt'M §? Aai — 


• b ha ■ * » 



"IIIHI»lltlllU«IIHIIinMllinHlllMaHHMIIffllllll*»l>ill KllKll 


■ •«>(■■-«•••. ■■■■«■•■> «>■■■■•■■«•■••■>■«■■■■>«■ ■■■■ ......... .*•■..•«.»•.». ...... 

lHi>li*i«iiUMiiniiiH«iiiiiiiiitiiiiiniiHii«iiiiiiii« >it(Muii«i«» l . a ii(i«if»iiiMiimiiii>' 

!■■■ ££!5!5ffff> SfSSSflSSfHSS!! ■•■55 ■•••■••••••••ft *■*•!*»»•••••« »••>»*•* 

■ •■■« 
« » « • « ■* • 

■ ■*»» » ">*•••■■*«>■■■• *ar *■ ■>•..••■■ 

* * t«. *»»•••,■* 

..«* -t L. * IlltllXI 

MKftiiiitiiiiiriiiitii . »« *;--•>•«•««. 

■ »•* ■■■■■■»*•#««» IIIIIIIi^MI: •l] l ..|t|l«' c ||i| tt ||M| t , 

■*»*■••■•**»»■••••§•*«■• * «•■><■»•.•*•*■■- 

iilliiiininfiiiiiiiiiiiiiiiii«iiiiiiiiiimHiiai(iifi«ii*n i«iiii«HucMiiiiM«<iiiiMi«uiii> 

lUllllllllillllllll ■■■»■■■■■>•■■■■■•■•■ ■■■■•«■■•■•••»•■»>•' •«f ••••■••••■*« -AM. 3 Ittirtlllll.tUtfl . 
IIUIIIIIIIIItlllllllllllilllMinil^KIUMUIKIIMIDfltMMUMMOt.l.^Mt.l.lllltlllWtllX- s < 

>«i>> ■>■*>>>• >•■>••»>..>•-■•■•*«•-•■•■•..••••••■■.■>•*•. a^.x ...,....««.«•.». .».» ...... ........ ... 



>*»■>■••..• i «»«. * ««. 

I ■■■•«■■■•■ ■«■••■■■■■■■■■■*»■■■»■■■■■■■« «•■■■»■■■ IBS' 

■■«••■•■ • 

■ ■>■■■■»)■«■■«■ ■itiillilit aa*Ba 


■ ■■■ 


_. JBBBMBB *■*■■■•* « a « 

i *•>«•■••«■■■•* ■■Miiiari 

■* MlilllllliflMMIt 

■ »*■ iffititiiiiaitimtiii •■iiaidiintKiiiiiii ■ 

■•■■•■■»•■ i»iiiiijfi>iiiiiiiiiiimm«iiitiiiitt< I 

■•■■■■«••» ' ■•■•■IIIIIIIIIiaillllUIIIIMIIllllllll itMiMtim (laiiMlffiliitKilikM) I 

_ JniMIMimilHIIUItlflimiltlMRRIIt'iiakitlflltMll.fflllMIHMIimif 

• mm ■ «f k . r ........ .) r>( 

• •■laiMaiMii'i j - v-j 

lltlNMIKKIMtiU ' 

Criteria for CcnaUtecoy and uniquenee* la R«lay circuit! 


September ft, 1M1 

Zb ft ayatea of linear algebraic equation*, thara 
ara tfcree poaaibla type* of de«eu*rnoy, n&aely lneonaiateaey 
(no poaaibla aolntioa), assblguity (solution* not uniquely 
determined) and redundancy (aura equation* than neeeeaarr) • 
Scoe**ary and auffioiont condition* ara known for the a* 
types of degeneracy in tcra* of the rank* of mm coefficient 
and augmented satrioea. Soaewfcat elailar af facta can occur 
in tna boolean equation* characterising relay oircuita, gir» 
ins riaa respectively to chattering aaoiguity of relay pool- 
tioa for certain value of the independent variable a, and reduad- 

^UaVCJJ^ ^Je?^ HJ^avdsVj^JJ ^^e^? ^M9&aa t '^^^aV^jtfca^^ 3ha^fc ^*1b^*J**^J e^H^c* 1 ^*^ Jpas\J?^fce^ca^^n> ^H^^L^fc^Ht^^LJfc ^cTiij^^^a, 

W« aattM i aihmA fjM» thft«> m nnA I tlrtna Im t— mm f»f a a ilMKltt 

ae^a? ^s*es> ^*^acaa»>ea>*aaa^pa» *> wcT Waaler i*^i*» ^p^peiwn ek vavatv aa^ai w^ses, a* ^e^w^a w 

dlacrlainant 7. 

Consider a relay circuit containing •** relay a 
*X> «gf •••• Hake and break a oat cot a oa ^ are dealg- 
aated a A aad *J, and we auppoca that thara are a independent 
variable a 1 , e^, •»•, e^, which do not depend oa the relay 
poaitlona. 0uah a circuit la equivalent to the circuit of 
Fi*. 1 in which 

*i *B* **** *** *i» *#,• ••• 

la the Boolean function which la aero when the awitchee 

*»ft MitMti a^, ere la eucfc position* that the volt- 

«M wro» la the original circuit la *uf r icloot to oper- 
ete It ana oh otherwise. The fenetloa 



will be •till* the oirauit ai«cri*ta*nt. *e alee define the 
following it mm* a eteadr etate la a relay circuit corres- 
ponding to a given aat of veluee of the laaepeaaoat variables 

Ais a act of poaltloaa P.. ?«. JLrtao 
relaye oath that If tao iadepeodeat variabice ere given 
tao valuee A^, end tao ralaye held la tao position 

T t > ««»• P a lea* enough for tao eteadr atato fluxee la tao 
00U0 to build *», the relays will remain la tao aaao poal- 
tloaa ladefinBtely, 

a oeapletelr •oolUatoay oteto at a relay elreult 
la a aot of valaoa Mg % A,, „ #f of the independent variables, 
each that ao natter what tao Initial yoaltloae of tao relays, 
or how long they are held la that position, ansa they ara re- 
leesed at least oao aakeo aa laflalto auaeer of eeoUlatloas, 
I.e. ehattare. Xa addition to theee obviously exclusive pocei- 
hUitles a alrealt nay be •partially* oscillatory for eertela 

Y*lu*i of th« loft«j>emaoftt rarioblos- with mm iaitUl oonCi 
tiooo th« •Ircuit oh&tt«r* and with otters roiftpooo ioto o 
•toot? ototo. Ao oxonpla U oho** im Figure a wtero with 
too ioltiol OOO&MOO 

a x • (o9»i»to4) 

tho oireuit «h*ttero while with 

tho oireuit rei&peee into tte eteefijr ototo • 1, Rg * 1 

fttSBBI I • *°* *i§ *••* *£• * M t* »e o otooA/ 
ototo It is oeeeoeerjr eoft ouffloleot toot 

This lo aeoeoeejy eiooe lo o otoo^jr ototo too oeotooto of 

■ ■ 

relay «1o41o#i 


to toot 

o-ai^ol^-t «*eo • W v m t • A^ 

Xt la sufficient sines 

so tt*t if tii* relays are hsld is these positions ? A long 

enough fear fluxes to build up they will remain there* 


Theorem II • For .... to be completely oscillatory 
it is necessary end sufficient that 

t C*^t a^i «^» •••• a^) • l 

identically la the This la accessary sines other- 
wiss there Is a sst of a^, say 9^ such that * * and 
this Is a steady stats by Theorsm X, It la sufficient 
alas* If true thsa with any starting position say 
9 V •»*, F a at least one tern of ths sua (1) say * t • n^ 
la equal to one. aa that 

snd one or ths other ana to • hence, After sons relay has 
shangsa «a still boys ths sans aitaatloa sines f - 1 so 
that at lsaat one relay ashes aa infinite number af shannon 
of position* 

- 5 - 

la »tM f U i# A^t #♦♦» a^) is * function 
•f tfat (ait idontioalir ©at or n«ro) too oyste* h»» 
•om nt«aay »tata« aawoly tat roots of f « 0, Out for 
arbitrary starting conditions w* saenot toy what the notion 
will so, Khataer s elroalt eeefce out s steady state or sot 
depends set only on ths artwork topologr so la Fig, 2» oat 
•loo oa relay ehareoteristise as la Fig. 3. Bare If lo 
olow operating ana *j wy fast the « iron it oar chatter 
with both relays ialtieUy uaeps rated for a g nay new 
stay la long eaoasfe to opsrsto K^. If lo fast and 
Sg alow release* too systea rolapooo lata * x * 0, Rg • 1. 
Boaoo no purely slgsbrais oo editions saa So sot ap to deter- 
alao whether a olroait will rolapao lata a stood? otota whoa 
la a function of s^ t 

© ojk ^fts^ eiSKe^sKJo^SPf 


SvlIj 15, 1943 

Gap? Ko 




Professor W, Feller of ErovzD. University and 
Dp, 0» E» Shannon of the Bell Telephone Laboratories 



This is a report on Investigations made at the request 
of Dp. Warren Weaver (letter of December 28, 1942). Our study 
has been based partly on oral information received in Aberdeen 
(January 18, 1942) and partly on the material contained in the 
Report No. 319 of the Ballistic Research Laboratory ("Report 
on the Differential Analyzer at Aberdeen Proving Ground" by 
Major A. A. Bennett, December 1942). The technical set-up 
as described in that report will in the sequel be referred to 
as "present set-up". It should be clearly understood that we 
were not to study possible technical improvements of the ana- 
lyzer as such nor to reexamine the theory underlying the dif- 
ferential equations. Accordingly, the present report is con- 
cerned only with an examination of the procedure of mechanical 
integration of the differential equations of ballistics as 
used at present. Furthermore, we have not considered any methods 
of integration other than on the differential analyzer. 

Before proceeding to describe devices which might 
contribute to the efficiency of the analyser we wish to summarize 
some negative findings, as these may render superfluous similar 
investigations by other persons. 

a) We have carefully investigated a great number of 
alternative set-ups, on the differential analyzer, of the dif- 
ferential equations either in their present form or using 
various new variables. However, we have been unable to find 
any form superior to the method as used at present in Aberdeen 

which, in our opinion, is the most efficient one. 

b) We have studied the advisability of using some 
method of successive approximations. Such methods naturally 
present themselves since one should expect them to reduce the 
ranges of the variables involved and thus increase the accuracy o 
However, a closer study will show that it is almost invariably 
necessary to subtract, on the analyzer, two large quantities 
which are themselves independently obtained on the analyser. 
This, of course, nullifies the desired effect of reducing the 
ranges. Various possibilities have been studied and, among 
fchesn, the possibility of starting with the vacuum trajectories 
and integrating the difference between them and the actual 
trajectories. Again we were unable to find a method which 
would aopear superior to the present set-up. It will be noted, 
however, that the modification of the latter suggested below, 
can in some sense be interpreted as the first step in method 

of successive approximations. 

c) Several perturbation methods and expansions 
according to various parameters have been tried paying special 
attention to methods suggested in the newest Russian literature . 
None of these methods seem appropriate for the analyzer « 

Coming to the less negative part of this report we 
remark that an adequate theory of errors of the differential 
analyzer is not available at present. However, simple theoretical 
considerations based on experience gathered at M.I.T. make it 
appear that a very considerable part of the total error is due 


of error are backlash and,, perhaps even bo?®, inaccuracies in 
the following meehenism for- the input and vector tables . It 
ssems therefore possible to achieve a gain in accuracy by P®« 

dueing the range o£' the variable?? in the integrators, even 
though this nay neeossitat© the introduction of new adders 
and gears. $hs following r ecomsaendat ions are based on this 
assusaptiO'At We proceed* step by step starting with the simplest 

Recomend&tions , 

1) Consider, to begin with, the horizontal displace- 


sent 2:. Obviously dx/dt will range from its maximum r, at 

the beginning to seine fraction of it, say qx Q , at the end* 
Accordingly, when integrating in the usual form 

(1) X * X dt 

the integrand ranges from qz c to x Q , Now this means that 
only a fraction 1 " - 3 — of the total range of the integrator 
disc is used even if we suppose that the goale factor has been 
chosen in the best way (30 that the rim of the integrator disc 
is used for values of x near x ). If, instead, we 

14J_ i f * 1 . <l 
(2) x - — g r xot « j( z . i-| a^Jdt , 

1 — Q " 

the Integrand will range from its ma x imum — *o t0 lta 


- 1 - a i 

2 o 

This allows one to use a scale factor 

■s r times as large as in the set-up (1) and to utilize 

1 - q 

the entire integrator disc. This, of course, means a consider- 

able gain. 

Eow the constant 

i ± q 

in the integral in (2) 

appears only as an Initial displacement. It is therefore seen 
that the realization of the proposed set-up (2) requires, as 
compared with the customary set-up (l), an additional gear (to 
produce 1 t q aLt ) and an adder. The following figure shows 

the simplest mechanization. 




14-Q . 

x - 2 x t 


It goes without saying that the gear ratio does not need to 

be exactly 

I. +. .3 4 


x Q • any number near the middle of the range 

of the integrand will do the same services • 

If used to its fullest extent, the system as described 
changes a previously positive variable into one taking on also 
negative values. Although only one change of sign is introduced 
this will introduce some new backlash* Now, if instead of (2) 
we mechanize 


x - qx.t 

qx Q ) dt, 



the new integrand does not change sign, and no new backlash is 

introduced. On the other hand, the optimum scale factor for 

(3) is only — times that for (l), that is to say half the 
1 - q. 

scale factor for (2). We conclude that with proper corrections 
for backlash the set-up (2) should prove besto However, if 
enough frontlash units are not available at Aberdeen, the set- 
up (3) may be tried with advantage. 

2) A similar device can obviously be used wherever 
the range of the integrand does not utilize the integrator 
disc to its fullest extent* This is true for almost all 
integrators whose outputs are: 

(i) the horizontal displacement x, 
(ii) s = fv dt , v being the speed, 
(iii) Q" hj , where y is the height* 

In the first two cases the new set-up would not produce any 
additional loading since the integrators are driven by the 
independent variable-motor. In other cases an additional 
loading would ensue which may have to be compensated by the 
uae of a larger scale factor on the t-shaft; this would in- 
directly slow down the machine. Whether this will have to be 
done is impossible to predict theoretically. Should it prove 
necessary, it would be for the user to decide whether the gain 
in accuracy is worth the loss in speed. 

3) If the above described device should prove in- 

- V/ -- 

* - v j 

?'ar e &i#£iuZ£ fit cbs atpens* or 

f ©Hewing uspr-c-vftmca? &*t 
oonaidaraMa Eaaua] #J>rk end io&s Tn4 process of 

integration may bis Stopped it ecn^aivfsat wnd tlx* 

dure 4-5 cie:-- <jr 'be:: ?abr»vs! fe« <'* TX ' f ' 

intervals? C-ofttfSSeifi. f'^r wxrole •. «c? 5.afcet*iaa4 ! febi fs*« 
indicated ite the figure *' rath as ex» si 




Her'?, even the usual pros a dure of Integration utilises the 
entire range of the integrator disc and no gain can be achieved 
by Means of the device as described above ► Ee^ever £ , the integrand 
any conveniently be treated by a double application of this 
device splitting the interval of integration into two parts » 
In othsi words, insteed of e given function fix) we integrate 
the difi eranee betveen fix) and a step-function. The output 
of she integrator is ~,o longer P'x) * j bufc th * 

difference be ere en »' x) end e triangular (or "roof*-; funesisn. 


r~ — V- 


i — s„: 


Similarly, with a convenient subdivision we may use any step- 
function for the integrand and the corresponding polygonal 
line for the integral. 

This procedure obviously requires resetting the 
integrator in question and changing one gear ratio each time 
the machine is stopped. On the other hand, the increase of the 
scale factor is roughly proportional to the number of subintervals, 

4) In principle this procedure may be looked upon 
as a special case of the following more general method. Instead 

(4) v(x) = Jj dx 


(5) w(x) + 0U) = \(y + $*) dx, 

where 0(x) is an arbitrary function and 0Hx) its derivative. 
In practice, of course, 0(x) should be chosen so as to render 
the maximum of Jy + 0'\ as small as possible in order to in- 
crease the scale factor on the integrator. Now if 0(x) is 
not a linear function, the mechanization of (5) would require 
two new input tables or their equivalent. However, the possi- 
bility of obtaining some special 0(x) by means of non-circular 
gears should not be overlooked. This would mean a considerable 



improvement of the linear method. 

5) We have been asked by Dp. Dederick to consider 
whether it would be advantageous to generate from an 
input table (instead of by integration, as at present). The 

foregoing remarks contain an answer to this question. It is 
not difficult to s ee that the present method of obtaining the 
function by integration is more efficients It would probably 
become even more so if the recommendation 2) were put into 

6) Although it is in no direct connection with the 
subject of this report, we enclose an Appendix describing a 
simplified method for computing gear ratios. This method is 
based on previous experience (of one of us) at M.I.T. and may 
prove useful in connection v/ith ballistic work on the Aberdeen 
Analyser . 

Brown University, Providence, R.I. 


Bell Telephone Laboratories, N.Y. 
May 27, 1943. 

W. Feller 
C.E. Shannon 





In this appendix a simplified method of determining 
gear ratios for an analyzer set up will be described which 
was used for some time on the K.I.T. analyzer and proved in 
general to be considerably faster and easier to change than 
the original method of equalities and inequalities. The 
method may be briefly outlined as follows: 

1. Draw the set up with an unknown gear ratio in 
each shaft of limited displacement. An unspecified 
ratio is also placed in the two inputs of each adder. 

2. Calculate an approximate scale factor on the 
independent variable to give the expected time of 
solution at the average rate at which it turns. 
Choose an exact scale factor near this approximate 
one which is a "round figure" in terms of obtain- 
able gear ratios - i,e., factorable into a small 
number of simple rationale. 

3. Choose in the same way scale factors for all 
shafts of limited displacement - integrator inputs 
and function table inputs, and outputs - so as not 
to exceed their limits with expected displacements. 

4. This fixes p by division, and from the integrating 
factor of the integrators, the scale factors and 
gear ratios of all shafts except those containing 
adders. In the case of adders the input shaft with 
smallest scale factor fixes the scale factor of the 
adder, the other input being geared down to the same 
scale factor. The output gear in the adder is then 

5. The set up is then inspected to see that no 
integrators or other parts are too heavily loadedo 
If they are, reduction gears are transferred from 
inputs to outputs to reduce loads when possible, 
otherwise the soale factor on the independent 
variable is increased. 

In case the ratios come out too complicated dif- 
ferent scale factors are chosen in Step 3. With a little 
practice and foresight, however, it is possible to obtain 
suitable ratios on the first trial. 



Two Hew Circuits for Alternate Pulse Counting 

The well known W-Z relay circuit is shown in 
Fig. 1. A is a pulsing contact which is alternately opened 
and closed. Indicating closure of contacts by and open- 
ness toy 1 and for relays for operated (up) and 1 for 
unoperated (down) the circuit goes through the following 
periodic cycle of operation: 









• 1 




Thus one complete cycle requires two complete pulses on A. 

This note describes two apparently new circuits 
which perform the same function. These are shown in Fig. 2 
and Fig. 3. The operating cycles for these are: 
Fig. 2 Fig. 3 



















These three circuits may be compared with regard 
to the number of elements required as follows: 

Belays Contacts Resistances 

Figure 12 1 continuity, 1 transfer 2 

Figure 2 2 2 continuity, 1 break 1 

Figure 3 2 2 transfer, 1 make 1 

In Fig. 3 the resistance is theoretically superfluous; 

if the transfer elements could be trusted never to be shorted 
it could be omitted, but in practice would be necessary to 
avoid shorts when the relays were being adjusted. Figs. 2 and 
3 are essentially duals, and 3 was obtained from 2 by the 
duality theorem. 

In Fig. 2 it may be noted that the two relays are 
*ip-when A is closed, while in the standard circuit they are both 
^jTwhen A is open. This might be desirable in some applications. 
Fig. 3 has the possible disadvantage that both ends of the 
pulsing contact A are connected into the circuit, while in 1 
and 2 one end can be grounded. 



. 1, 2, 3 


CONT. 6 W' 

— O G « 









-o o 


—O O— 






-0 3 



TRANS. Z TRANS. — ty\A/ — " FIG. 3 

-o o 


-o o — * 

-o o 

FIG. 1 

FIG. 2 



mm within uriimilti. int.. ifTrnr 

Counting Vp or ixmn vith -ulse counters w J 1 

iith binary counter* of either relay or *l»c5rsnic 
type i* is ;o£sit2« by simple KKsdif icutisn u> count bo ih up end 
doon. £uppose Us* largest uuaber that oaa be j w^isterec is L* 
refining the ao^lisent of «aiy »unh»r * & fey t-a * «' *e sots 
that subtracting * nutther » rrsJi S is s^ulvileai ta adSin* w its 
eoapllsjsnt ftt«i • Mf*He • thus If in 6 binary oouatsr 

** t&tis the soapllosat o/ « reading ^hioa s»&as locking up Uis 
;*ul*y urieft ttrt dSKja and #4ee-vei lu the oa^, aid 

putting out the tubas vfcioU fire ot&guetiag unfi vie iu Ute 

electronic auoe) and then let the counts* eo&tlnue add tits dumber 
of pulses in rjuertion, and finally t^ice the aa^lifitaat, &^uin, we 
a&ve au&trseted the nuabsr. ^etually hm**v»r, this -raoees onn 
be done si&ply by trcuef orric^ the carryover le&as t» the opposite 
digit ( tube or rtl«y). ic the reity esse this sjoouats t*» a transfer 
Qcm toot *e«*c*n each adjnsent pair of digit*, a&e an additional 
safes oostoot* in the eleutrouio oaft* the carryover lease go froa 
the " tAtar tube plut* to triiis on the next sts^a. Here *e eoul4 
insert «n alcetroale transfer oontaat, *» s^wt, for exsnplo in 
Figure 1. jthen *c wish to add, the ©©asson eon troi leads far "edd 
is given sutoff voltage, the -subtract" lead a large negative vol- 
tage. A positive lapulee on the "one plate of a state then cause* 
one side of the double triade to c endue t giving % negative impulse 
to the next g7id» far a enTryvwr • f er subtrfcctioo the voltages 
on the soatrol leads ars revexfcod atid carryover ooours when the 
"aero" plate volte, • inore&ses i.e., when this tube goes out* 

0« £. &£*KjfCX 

C-»f A (9-4*) 

Cover Sheet for Technical Memoranda 
Research Department 

subject: clrcuitg for a PiC>M> Transmitter and Receiver - 
Case 20878 


" S.A.S.,H.W.B., H.F. 


* G.W.Gilman 
5 -H.W.Bode 
s A. G. Jensen 
-> W.M.Goodall 

8 E.Peterson 

9 H.SoBlack 

10 -W.F.Simpson - Patent Dept. 

11- J. H.Pierce 

12- R.L.Dietzold 

13- £.B Zeldman t$55$£^L 

14- W.T.Wintringham 

15- F.B.Llewellyn 

16- C.H.Elmendorf 

17- B. M.Oliver 

1 8- C.E. Shannon 


DATE June 1, 1944 
author s c.E.Shannon and 


Circuits are described for a P. CM. transmitter 
and receiver. The transmitter operates on the principle 
of counting in the binary system the number of quanta 
of charge required to nullify the sampled, voltage. 



Ciroults for a P. CM. Transmitter and Receiver - Case 20878 

June 1, 1944 


The circuits shown in the present memorandum are 
intended to fill the boxes of the block functional designs 
for a PCM transmitter and receiver shown in Fig. 6 of a December 
1943 lueworandum (MM-43 -110-43) . The transmitter functional 
diagram is shown here as Fig. 1 and the general operation 
is as follows. The incoming signal is sampled periodically 
by closing the electronic switch 1 with periodic impulses 
from the timer. This charges condenser C to the sampled 
voltage and the electronic switch opens after each impulse 
isolating the condenser from the signal. The existence of 
a voltage across the condenser causes the comparator to olose 
electronic switch 2 which allows pulses of charge to feed 
into the condenser from the pulse generator, discharging the 
condenser. The number of these pulses is counted in the 
binary system by the binary counter and when the condenser 
is reduced to a reference voltage, the comparator opens elec- 
tronic switch 2. Near the end of the sampling period the 
binary counter is connected to the distributer which registers 
the binary number counted, and the counter is then reset to 
zero; both of these operations controlled by impulses from the 
timer. The distributer then sends a series of pulses or not 
down the output line according as the binary digits are 
1 or 0. These digits are sent in reverse order, the least 
important being sent first, to tie in with the contemplated 
receiver circuit. 

The specific circuits are shown in Figs. 2 to 8, and 
detailed descriptions of their operation follow. 

Fig. 2 shows the electronic switch 1 which charges the 
condenser C to the signal voltage at the sampling times. The 
signal wave is biased up so that its minimum value is slightly 
positive, and impressed on terminal 1 as a voltage; i.e, the 
signal source as seen from terminal 1 is assumed to be of low 
impedance. The timer, at the sampling time puts a positive 
pulse on terminal 2, which is inverted by the triode to give 
a negative pulse on the pentode control grid. This causes the 
pentode which was previously conducting to cut off. Before 
the pulse condenser C had a small minimum positive charge 
and neither diode was conducting since the plates were held 
at a low positive potential by the pentode current. As the 


pentode cuts off, the diode plates swing positive and the right 
hand diode starts to conduct charging the condenser. As this 
condenser voltage builds up exponentially the voltage on the 
diode plates also increases positively until it reaohes the 
signal voltage and at that instant the left hand diode starts 
to oonduct. The voltage stops rising at this point since the 
plates are now essentially short circuited to the low impedance 
signal source. This all occurs during the timing pulse, and 
at the end of this pulse the pentode again starts oonduoting 
dropping the diode plates to a small positive voltage, less 
than the minimum signal voltage, and isolating the condenser* 

Fig. 3 shows a standard multi-vibrator circuit for 
giving a series of square pulses. The coil condenser cross 
connection of plates to grids causes the grid transient to 
be a cosine curve which crosses the cut off grid voltage at 
a time determined essentially by the LC product and independent 
of amplitude changes due to variations in plate supply, etc. 
As this point determines the period of oscillation, the 
oscillator has good frequency stability. The output appears 
on terminal 6 as a square wave. 

Fig. 4 is the comparator, which is actually only a 
differential amplifier with sufficient gain so that the 
granularity voltage applied to the input is capable of 
driving the amplifier from saturation in one direction to 
saturation in the other. The input is the voltage on condenser 
C which immediately after a sampling instant, will be at the 
sampled signal voltage. This voltage starts decreasing by 
steps as the condenser is discharged and when the condenser 
voltage applied to terminal 3 moves down the step which crosses 
the differential amplifier threshold, the amplifier swings from 
saturation with output terminal 5 at nearly zero voltage to 
a high negative voltage. 

The electronic switch 2 is shown in Fig. 5. This 
circuit sends units of charge into the condenser through 
terminal 3 under the control of the comparator output coming 
in on terminal 5. The multi-vibrator output is connected to 
terminal 6 and the output of the multi-grid tube will be a 
square wave when 5 is positive, which ceases when the 
comparator swings to the other saturation point driving the 
voltage on 5 in the negative direction. The double diode 
connection gives a pump action. When the plate voltage of 
the multi-grid tube increases to the upper part of the square 
wave, the charge flows into the condenser from terminal 4 
through the left diode. During the lower part of this wave 

- 3 - 

the oondenser discharges through the right diode out into the 
condenser C, via terminal 3. As this causes the potential of 
3 to decrease gradually down a step function, it is necessary 
for the input voltage at 4 to decrease similarly; otherwise 
the difference in voltage between 3 and 4 would cause the size 
of quanta to decrease gradually. This lowering of the voltage 
on 4 is accomplished by a cathode follower arrangement on the 
first cathodes in the comparator, which follow the step voltage 

The binary counter is shown in Fig. 6. The descending 
step voltage which appears on condenser C is applied to the 
input of this circuit through terminal 3. The input resistance 
condenser combination serves as a differentiating circuit (the 
time constant fairly small compared to the time between steps) 
so that the voltage applied to the first grid of the double 
triode consists of a series of negative spikes. The double 
triode is simply a two stage resistance coupled amplifier, and 
its output feeds the binary counter digit tubes. This circuit 
is of standard type with two pentodes in each stage and there 
are two stable points for each stage, one with the upper tube 
cut off and the lower tube conducting, and the other, the con- 
verse situation. A negative impulse from a preceding stage 
applied through the coupling condensers changes the state from 
the previous stable condition to the opposite one. This impulse 
is applied symmetrically to both suppressors, but the condenser 
across the cathode resistances, charged in one direction from 
the previous state, biases the choice of the next state toward 
the opposite one. The control grids of the "zero" tubes (the 
upper row which are conducting when the corresponding binary 
digits are zero) are connected to a common control lead which 
is used to reset the reading to zero after the reading is reg- 
istered by the distributor. This is accomplished by a neg- 
ative impulse from the timer. The outputs to the distributer 
are taken off the plates of the "unit" tubes. 

The distributer is shown in Pig. 7. After the 
number of quanta of charge has been counted in the binary 
counter, the leads 11, 12, 13, 14, 15 will have either low 
positive voltages or B+, according as the corresponding digit 
is one or zero. The grids of the left triode, will then be 
either negative or positive from the potentiometer action 
to the negative voltage C-. To register the counter reading, 
a positive pulse from the timer is applied to the control 
grid of the common pentode allowing it to conduct and pulling 
the cathode of the left triode and the diode in all stages 
negatively. If a digit is zero, the potential of the cathodes 
in that stage stops at a positive value due to current through 
the triode and the diode does not conduct. If the digit is 
one the cathodes are pulled negative and the corresponding 

oondenser C ia discharged through the diode and pentode. 
At the end of the registering pulse, the cathodes go positive 
again, isolating each C , with the digit registered as 
presence or absence of charge. The reading is taken off the 
(/— series of condensers C Q in sequence by positive pulses from 
the timer on leads 21, 22, 23, 24, 25. These pulses allow 
the right hand triodes to conduct and each Cq in turn to 

oharge through the output lead, leaving them in the normal 
state (at a voltage about equal to the pulse voltage). If 
the digit is "zero" no oharge of C Q from the output lead 

occurs. Thus negative pulses appear on the output when and 
only when the registered digits are one. 

The timer system is shown in Fig. 8. An oscillator 
which may be synchronized subharmonically with the pulse 
generating multi-vibrator, operates at the sampling frequency. 
This passes through the clipper amplifier to give a square 
wave, which is differentiated to give alternating positive 
and negative spikes. A second clipper amplifier eliminates 
the negative spikes and makes the positive ones rectangular. 
These short rectangular pulses are fed into a delay line 
terminated in its characteristic impedance. The timing pulses 
needed for the various circuit functions are tapped off at 
the appropriate places as indicated. A synchronizing pulse 
may also be taken off the same delay line. 

Fig. 9 shows the receiver circuit. The signal 
passes through the clipping amplifier which is adjusted to give 
a saturation voltage on the output if a pulse is present and 
none if absent. This output is applied to the grid of a 
multigrid pentode, whose other control grid is given positive 
gating pulses at the center of the digit intervals. These 
gating pulses allow the pentode to conduct if a pulse is present 
and the plate current is then independent of the plate voltage 
(providing this stays within certain limits) so that if a 
pulse is present, a fixed amount of charge (equal to the 
length of the gate times the pentode current) flows onto the 
condenser. The time constant of the R C system (including the 
pentode load resistance) is adjusted to allow the voltage to 
restore itself halfway toward the equilibrium value in the 
time from one digit to the next, so that after all pulses 
have been oollected on the condenser, the charge contributions 
of the first, second, third etc. have decayed by factors of 

2^' i 2 "' 1# At this tlme a positive gating pulse is put 

( r on the grid of the second pentode, allowing the condenser to 
discharge rapidly into the low pass filter. The timer system 
can be realized with the systems shown in either Fig. 10 or 
Fig. 11. 



Figs. 1 to 11 


.-. \ Si 

F/G -J 

! • 


IuIjw sn*pe to fclnlaine Bend sidtn fcitn Munprerlar^iD* 7-uloea 

*e ooaslder tbe problem of » taping pule** #{t) enlen 

ere aero outside -fc, U in ouen * wey an to nlalml*» tbe UtmA 
nldtn of tbe power opeetrua of t&e ennenble of funotioas fors»4 
by aeadiiis s eeq*eaee of tne fuaetlean *{t) end 0, witb epeeia* 
or £i t tne probabilitiee of eltber b*i»£ 1/2. 

suoh eneesiblee of fun art iocs. 

Theorem: i*t an ensemble of function* bo defined by 

n« -~ 

enere tbe o^ ere enoeen iadopaaciintly end ore equally likely to 
bo one or s«ro. toe power epwetro* of f{t) ti*tn eomnleto of 
two parte, e point epeetrom eonsl*tia& of too epeetrw* of 
%X * (t*ftam), i.e. tne spectrum of o(t) repented, end o eontin- 
uvmm pert eoneintln* of tne ottor^y opoetrm of ♦(*) « 

f irst « theorem will bo prored on tne epestrtsa of 

Consider too estooorreletlom of f(t) 

4{ki - U» |f J *f <*> f(t»k) dt 

Y^OO _-r 
• U» A /*£ e{t***n) £ n* o(t**»m»>>} dt 

I** integrand oen bo written 

^ a % a* a(t*a*a) »{t**««00 
* j} •* a(t*t*a**J 

4 •£ fit-in) oftt* a«»*vJ 

>Uaa «• eraraga , Hit aua of tfca first two parta givaa Urn suto- 
correlation of ti* f aaatiaa J £ a* aiaaa tka ooaffiaiaata 

a* a a (a^a) feara saa oaanea ia four of aalag toots a$aal to eaa, 
aaa ia tat aaaoaa t«r* *jS aaa taa aaa* ataa vaiaa. 

Ttoo iaat tana la taa liait reausao to 
fit) f|I V) at 

• a 

by *? aoapaaaatoa for taa attoaar of taras. 
Taaao two parts (in taa saaarata aaa aaatiaaoaa porta 
of taa apaetnaa, taa first tolas taa aataoorrslatioa af a(t) 
raaaataa aaa taa aaaoaa tivlog taa saargy apoatram af a(t) 

la oaao »(t) • oatalao -u, £, taa aaaarata part aaa 
poaor at o - ft, 1, t , S, .... aaoeatia* to 

f (t) - ^ ♦ r a m aaa at ♦ I. » a aia at. 

Sap^oM w *i*0 to Ofaopo o{t) ljrla« »iti»io -L, I is 
•at* • »oj os to alolalso to* bood oprood of too upectrua &* 
ooooorod ojr 

« - Jo* *(o) do. 

Tbo oantriOutiooo of too two parts of too spectra eon oo odd**, 
and toot fro* tfc* dooorot* port Is 

Tor too continuous port udo& too toooroo t&et too j» £ F*(« ) da - 
jt^ltJJ* dt wb*re ffo) ood fim) aro fourUr traoof rao «o Hovo 

*t • f*U) f - £ ten 1 • h** * a ♦ **a* * «*♦...! 

l.o* , tto mm oo too desoroto sootrlootioo. To* tatal a i» therefor* 

To mioiodse * «ltO o flood total eoersjr per poise 

oed with ooosdoxy ooodltiooo •(£) - - wo vast ooTiooolj 

plooo oil too eoergjr la too first tere, o oooloo oorto displaced 

to oo tensest to too tUM) oxio. 


A « 


Cover sheet for technical memoranda 

Research Department 
subject: A Mathematical Theory of Cryptography - Case E0878 ( ^0 



i _ HTfffl-HF-Case Files 

2 - 

CASE files 

3 — 


V » 

4 - 




3. Black 

6 - 


B. Llewellyn 

7 - 



8 - 


tf» Oliver 

9 - 


E, Potter 

io - 


B. H. Feldrian 

11 - 


C. Kathes 

12 - 


V. L. Hartley 

13 - 


R. Pierce 

14 - 


W. Bode 

15 - 


L. Dietzold 

o 16 - 


A. MacCall 

17 - 


A. Shewhart 

J.8 - 


A. Schelkunoff 

19 - 


E. Shannon 

20 - 

Dept. 1000 Files 

mm— 45-110-92 
date September 1, 1945 
author C. E. Shannon 
INDEX no. P 0#4 

Dos mi saui 


A mathematical theory of secrecy systems is 
developed. Three main problems are considered. (1) A 
logical formulation of the problem and a study of the 
mathematical structure of secrecy systems. (2) The 
problem of "theoretical secrecy," i.e., can a system be 
solvod givon unlimited time and how much material must 
be intercepted to obtain a uniquo solution to cryptograms. 
A sccrocy measure called tho "equivocation" is defined 
and its properties developed, (3) The problem of 
"practical socrocy." How can systems bo made difficult 
to solve, ovon though a solution is theoretically 

POS8lbl0t ' • ' THIS OOCUKEHT CO^S^-or^ 5g 

STATES ^^fK ^ 

LAWS, TIU.E I? RCVEX**** 1 OF «J* 


A Mathematical Theory of Cryptography - Case 20878 ((4) 

September 1, 1945 
Index P0.4 

Introduction and Summary • BOD WR 5200.10 

In the present paper a mathematical theory of . . • 
cryptography and secrecy systems Is developed*. The entire 
approach is on a theoretical level and is intended to spmple* : 
ment the treatment found In standard works on cryptography, * . • , - V • 
There, a detailed study Is made of the many standard types of-^ : - • 
codes and ciphers, and of the ways of breaking tjiea*. We will 
be more concerned with the general mathematical structure, and 
properties of secrecy systems, •: . .-' 

The presentation is mathematical in character. Wo 
first dofino the pertinent terms abstractly and then develop 
our results as lcnrias and theorems. Proofs which do not con- 
tribute to an understanding of the theorems have been placed 
in the appendix. 

The mathematics required is drawn chiefly from 
probability theory and from abstract algebra. The reader is 
assumed to have some familiarity with these two fields. A 
knowledge of the elements of cryptography will also be help- 
ful although not required. 

The treatment is limited in certain ways. First, 
thero are two general typos of secrecy system; (x) conceal- * 
ment systems, including such methods as invisible ink, con- 
cealing a message in an .innocent text, or in a fake covering 

cryptogram, or other methods in which the existence; of the . - 
message is concealed from the enemy; (2), "true" seorocy systems . 
where the moaning of the message is concealed by ciphofr, code, 
etc., although "its existence is not hidden. We oonsider_ only V 
the second type--oonoealment systems are more of a psychological 
than a mathematical problem. Secondly, tho treatment Is limited v 
to the case of discrete information,, whore tho information to 
bo enciphered consists of a sequence of discrete symbols, each - 
chosen from a finite set. These symbols may be letters in a 

*Soo, for example, H.F.Gaines, "Elementary Cry^tana^ 1 J ( s^o R MAT.oN w«g 
or M. Glvierge, "Cours do Cryptographic. ft; 5 TME katonm- oi^ w ^Vvonage 

* " person is p*«oH»an«> a* 

- 2 - 

language, words of a language, amplitude levels of a "quantized" 
speech or video signal, etc., but the main emphasis and think- 
ing has beon concerned with the case of letters. A preliminary- 
survey indicates that the methods and analysis can be general- 
ized to study continuous cases, and to take into account the 
special characteristics of speech secrecy systems. 

The paper is divided into three parts. The main re- 
sults of these sections will now be briefly summarized. Tho 
first part deals with tho basic mathematical structure of 
language and of secrooy systems, A language is considered for 
cryptographic purposes to bo a stochastic process which pro- 
duces a discrote sexjuonco of symbols in accordance with some 
systems of probabilities. Associated with a language there 
is a certain parameter D which wo call tho redundancy of the 
language, D measures, in a sense, how much a text in tho 
language can be reduced In longth without losing any informa- 
tion. . As a simple example, if each word in a ■t'efcfc' ip repeated 
a reduction of 50 'per cent is immediately poesi*lcV .further 4 : : 
reductions may be possible due to tho statistical structure of * 
tho language, the high frequencies of cortaih lottersor v words, r 
etc. The redundancy is of considerable importcjido ' ;in; the ' study ' 
of secrecy systems. , ' /; ' 

A secrecy system is defined abstractly as a sot of 
transformations of one space (the sot of possible messages) 
into a socond space (the sot of possible cryptograms). Each 
transformation of the set corresponds to enciphering with a 
particular key and the transf omations are supposed reversible 
(non-singular) so that unique deciphering is possible when the 
key is known. 

Each key and therefore each transformation is assumed 
to have an a priori probability associated with it— the proba- 
bility of cEoosing that key, Tho set of messages or message 
space is also assumed to have a priori probabilities for tho 
various messages, . i.e., to be a probability c^ measiire space. 

f ■ 

In the usual cases the "messages" oonsist of sequences 
of "letters.". In this oase as noted above the ©essage space is 
represented by a stochastio process which generates sequences of 
letters according to some probability structural ■. ~ : - :< p 
.' • , • v ' ' '*•:..- •'. - '•• . " • . , ! .' -v • ,; 

">." These probabilities for various keys and messages^ are^ 
actually the enemy, crypt analyst's a priori probabilities for / 
the choices in question, and represent his. a j>rl6rf knowledge" 
of the situation* Touse tho system a key is first selected 
and sent to tho receiving point. The choice of 6,&©y determines 
a particular transformation in tho set forming the^sys torn. Then 
a message Is selected and tho particular transformation applied 
to this message to produce a oryptogram. This cryptogram is 

- 3 - •HlffflSHflAL 

transmitted to the receiving point by a channel that may be 
intercepted by the enemy. At the receiving end the inverse 
of the particular transformation is applied to tho cryptogram 
to recovor tho original message. 

If the enemy intercepts tho cryptogram he can calcu- 
late from it the a posteriori probabilities of the various 
possible messages and keys which might have produced this 
* cryptogram. This set of a posteriori probabilities constitute 
his knowledge of the key and moss ago after the interception.* 
The calculation of these a posteriori probabilities is the 
generalized problem of cryptanalysis • ' ~ ."" " ; \ 

i * 
As an example of these notions, in a, simple substi- 
tution cipher with random key there arc 261 transformations, 
corresponding to the 261 ways we can substitute for 26 dif- 
ferent letters.' These are all equally, likely and each there- 
fore has an a priori probability l/B&Wz it this is applied 
to "normal English" the cryptanalyst being assumed to have no 
knowledge of tho message source o^hoc than,, that- it is English, 
tho a priori probabilities of various m&jBsageak Gf N lectors' 
.ore merely their frequency in normal JSngiish iext* ~ 

If the enemy intercepts N letters of cryptogram in 
this system his probabilities chango. If N is large enough 
(say 50 letters) there is usually a single message of a poster 
probability nearly unity, while all others have a total proba- 
bility nearly zero. Thus there is an essentially unique "solv 
tion" to the cryptogram. For K smaller (say N « 15) there wil 
usually be many messages and keys of comparable probability, 
with no single one nearly unity. In this case there are multi 
"solutions" to the cryptogram. , , - 

Considering a secrecy system to be a set of trans- 
formations of one space into another with definite probability 
associated with each transformation, there are two natural coe 
binlng operations v/hi oh produce a third system from two givon 
systems. The first combining operation. Is called the product 
operation and corresponds to enciphering the message with the 
first system R and enciphering tho resulting cryptogram with 
system S, the keys for R and 3 being .chosen. ; independently. 
This total operation is > secrecy sjrstcte "whose transformations 
consist of all the products (in tho Jusual , sons© of products of 
transformations) of transformations ia $ with transformations 
in R. The probabilities arc 'the prodticts of the" probabilities 
for tho two transformations. . . 3. J§E .:\ T- 

The sooond combining operation is "weighted addition 

»> J T- - 

T - pR 4 qS . J . p * q «- 1- 

*"Khowlodgo" is thus identified with 'a set of propositions hav 
associated probabilities. We are liero' at variance with the 
doctrine often .is sumo d in philosophical studies which conside 
knowledge to be a set of propositions which are either true o 
fslso. . f ■ :. v. 



It corresponds to making a preliminary choice as to whether 
system R or S is to be -used with probabilities p and q, respec- 
tively. When this is done R or S is used as originally defined. 

It is shown that secrecy systems with these twn com- 
bining operations form essentially a "linear associative algebra 
with a unit element, an algebraic variety that has been exten- 
sively studied by mathematicians. Some of the properties of 
this algebra are developed. 

Among the many possible secrecy systems there is one 
type with many special properties. This type we oall a "pure" 
system. A system is pure if for any three transformations T, . 
T. t T k in the set the product 1 

T iVV . 

is also a transformation in the set, and all keys are equally 
likely. That is enciphering, deciphering, and enciphering with 
any throe keys must be equivalent to enciphering with some key. 

With a pure cipher it is shown that all keys are 
essentially equivalent—they all lead to the same set of a 

posteriori probabilities. Furthermore, when a given cryptogram 
is intercepted there is a set of messages that might have pro- 
duced this cryptogram (a "residue class"/ and the a posteriori 
probabilities of messages in this class ore proportional to the 
a priori probabilities. All the information the enemy has ob- 
trinod by intercepting the cryptogram is a specification of the 
residue class. Many of the common ciphers are pure systoms, 
including simple substitution with random key. In this case 
the residue class consists of all messages with the same pattern 
of letter repetitions as the intercepted cryptogram, 

Two systems R and S are defined to be "similar" if 

there exists a fixed transformation A with an inverse, A" 1 such 

' . R « AS . , ~ 

■ * ' J 

If R and S are similar, a one-to-one correspondence between the 
resulting cryptograms can be set "up leading to the same a poste- 
riori probabilities. The two systoms are cryptnnalyticaTly the 
samo , v . » . 

The second main part of tho paper deals with tho prob- 
lem of "thooretical security." How secure is a system again: 
cryptanalysis when the enemy has unlimited time and manpower 
available for tho analysis or intercepted cryptograms? 

"Perfect Secrecy* is defined by requiring of a system 
that after a cryptogram is intercepted by the enemy the a pos- 
teriori probabilities of this cryptogram representing various 
messages be identically the same as the a priori probabilities 
of the same messages before the interception. It is shown that 
perfect secrecy is possible but requires, if the number of 
messages is finite, the same number of possible keys--if the 
messago is thought of as being constantly generated at a given 
"rate" R, (to be defined later), key must be generated' at the 
same or a greater rate* 

If a secrecy system "with a finite key is used, and N 
letters of cryptogram intercepted, there will be, for the enemy, 
a certain set of messages with certain- probabilities, that this 
cryptogram could represent. As N Increases the field usually . 
narrows down until eventually there is a unique "solution'*: to 
the cryptogram — one message with probability essentially unity : 
while all othors are practically zero. A quantity OJN) is de- >' . : \ 
fined, called the equivocation, which measure^ lii n statistical v 
way how near the' average cryptogram of H letters is to a unique 
solution; that is, how uncertain the enemy, is of the original; - - 
message after intercepting a cryptogram of N letters. Various 
properties of the equivocation. are deduced — for example, the 
equivocation of the key never incroasos with increasing N. 
This quantity Q ia s theoretical secrecy index — theoretical In 
that it allows the enemy unlimited time to analyse the cryptogram 

The function Q(N) for a certain idealized type of 
cipher called the random cipher is determined. With certain 
corrections this function can be applied to many cases of practi- 
cal interest. This gives a way of calculating approximately 
how much intercepted material is required to obtain a solution 
to a secrecy system. It appears from this analysis that with 
ordinary languages and the usual types of ciphers (not codes) 
this "unicity distance" is approximately |K|/D. Here |K| is a 
number measuring the "size" of the key space. : If. all keys are 
a priori oqually likely |K| is the logarithm of the number of 
possible keys. D is the redundancy of the language and measures 
the excess information content of tho language. In simple sub- 
stitution with random key on English |K| isltW) 261 or about , / . 
£0 and D is about .7 for English. ■ Thus unicity occurs at about .. 
30 letters. _ *' ' . _ >. ; J;V^a' V '' V Y. ' 

It is possible to" oonstruct secrecy . systems with a 
finite key for certain ""languages" in which the function ft(N) 
does not approach zero as N «©» - In this case, no natter how . 
much material is intercepted, the enemy still does not got a., — 
unique solution to the cipher but is left with many alterna- 
tives, all of reasonable probability. Such systems we call 
ideal systems. It is possible in any language to approximate 
such behavior — i.e.., to make the approach to zero of Q(N) recede 

- 6 - 


out to arbitrarily large N. However, such systems have a 
number of drawbacks, such as complexity and sensitivity to 
errors in transmission of the cryptogram. 

The third part of the paper is concerned with "prac- 
tical secrecy." Two systems with the same key size may both 
be uniquely solvable when N letters have been intercepted, but 
differ greatly in the amount of labor required to effect this 
solution. An analysis of the basic weaknesses of secrecy sys- 
tems is made. This leads to methods for constructing systems 
which will require a large amount of work to solve* A certain 
incompat ability among the various desirable qualities of 
secrecy systems is discussed, 

\ - 



1. Choice, Infornatlon and Uncertainty 

Suppose we have a set of possible events whose proba- 
bilities of occurrence are p,, p g , ... , p_. Those probabilities 
are known, but that is all we know concerning which event will 
occur. Can we define a quantity which will measure in some 
sense how ^uncertain" we are of tho outcome? How much "choice" 
is involved in the selection of the event by the chance element . 
that operates with those probabilities? We propose as a numer- 
ical measure of this rather vague notion the quantity 

. ,n " : . ' :' . 

H « - Z p A log p A * » 

There are many reasons for this particular formula. Quantities 
of this kind appear continually in the present paper and in the 
study of the- transmission of information. 

To justify this definition wo will state a number of 
properties that follow from it. Those properties will not be 
provod here,* but are easily deduced from the definition. 
Properties of H * - 2 p^ log p^. 

1. H = if and only if all the p.^ but one are zero, this 

one having the value unity. Thus only when we are certain 
of the outcome does H vanish. 

2. For a given n, H is a maximum and equal to log n if and 
only if all the p, are equal (i.6. l/n) . This is also 
intuitively the most uncertain situation. 

3. Suppose there are two events in question, with m possi- 
bilities for tho first and n for tho second. Lot p^^ be 

the probability of tho joint occurrence of i for tho first 
and j for the second. The uncertainty of the joint event ?•. 

is - . 

H " " I J Pi ^ l0g P iJ • • 

For given probabilities p^^ ■ Z p. . for the first and 

* It is intended to develop these results in coherent fashion 
in a forthcoming memorandum on the transmission of informa- 
tion. ' 

qj » S for the second, tho quantity H is maximized if 

ond only if the events are independent, i.e., p^. = Pi^j * 
This maximum value is the sum of the individual uncertainties 

H — H x * Hg 

» -^S pj log Pj^ - 2 log q j ♦ 

These facts can bo generalized to any number of .different 

events, > ^ % . 

Suppose there are two chance events A and B as in 3. not 
necessarily independent. We define the mean conditional 
uncertainty of B, knowing A as - ••• 

BT A (B) - 2 p{A) H A (B> 

where H A (B) is the uncertainly of B when A has a definite A 

value A. Thus ^(B) is the average uncertainty of B for 

all different events A, weighted according to their differ- 
ent probabilities of occurrence c The uncertainty of tho 
joint event is the sum of the uncertainty of the first and 
the mean conditional uncertainty of the second. In symbols 

H(A,B) - H(A) + H A (B) 

This is true whether or not thero are any casual connections 
or correlations between the two evonts. 

In the same situation the uncertainty of B is not greater 
than the joint uncertainty H{A,B), 

H(B) < H(A,B) 

The equality holds if and, only if every B (of prdbability /~; 
greater than zero) is consistont with -only one A. That - 
is, if A is uniquely determined by B. • 

From properties 3 and 4 wo have . .. r- .* 

H(A) + H(B) > H(A,B). 

H(B) > H(A,B) - H(A) 

= H(A) + H A (B) - H(A) 

H(B) > H,(B) 


Thus tho uncertainty of B is not greater than its avoragc 
value when we know A. Additional information never in- 
creases average uncertainty. The equality holds if and 
only if A and B are independent. 

Suppose we have a set of probabilities p lf p g , p n# 

Any change toward equalization of these (supposing 'them 
unequal) increases H. Thus if p^ < p g and^wo Increase p^, 

decreasing pg an equal amount (to keep the sum 2 p^ con* 

stant at unity) so that p^ and p g aro more nearly equal, 

then H increases . More generally if v/e perform any rt aver- 
aging " operation on the pj,, of tho form ' 



a permutation of tho p. with H of course 
samc^. 3 

where 2 a^j * 1 and all a^ > 0, then H increases (except 

in tho special case where this transformation, amounts to 
no more than 
remaining the 

... • 

H measures In a certain sense how much "information is ' 
generated" when the choice is made. Suppose such a chance 
event occurs and we wish to describe which of the n possi- 
ble events took place • The average amount of paper re- 
quired to down in a properly chosen notation is 
in the cases of interest to us, about proportional to H. 
Thus there might be 10^0 «■ 1Q50 possible events, with 


■ 10"" 3 ^ and 

of them having a pr 
probability of ^ .1CT 50 . We could set up a notational sys- 
tem to describe which event occurs as follows* We number 

the events from 1 up to 10*^ + 10 50 and when one occurs - 
write down the corresponding number. The average amount 
of paper required will be proportional to the overage 
number of aigits we need. This will bo nearly 30 If the'li. /iy 
event Is in the first group of lO 30 , and about 50 If In the' " / *;/ 
second group. Thus the average number of digits, is about 
40. We also have ,"• - V 

K* -10' 
* 40 

30 | ip-ftf-iog ficT 50 

- 10 

9-. Although tho last result is only approximately true vtf 
the number of choices is finite it becomes exactly tri. 
when an unlimited sequence of choices is made. Thus 3 
a sequence of N independent choices is made each choic 
being from n possibilities with probabilities 
p^, Pgi ••*» P n then the total amount of information 

genoratod is 

H ■ - N Z Pjl log pj 

; If N is sufficiently large, the expected number of dif 
required to register tho particular choice made is arl 
trarily close to H, providing the. correspondence betwc 
- sequences of digits and sots of choices is correctly r 
. If incorrectly made it will be greater than H-. Moreo\ 
./V if n is sufficiently largo tho probability of needing 
more than H digits is very small* - \ / . , 

10* It can be shown that if wo requlro^oejrtiairi reasonable 
"properties of a measure o^choioot^H^ncertainty then 
formula - S.p^ log p A necessarily follows* These roqv 

properties and the proof of this statement are given i 
Appendix I t The chief property is that tho measure be 
a sense additive— if a choice be decomposed into a sei 
of choices the total choice is the sun (properly weigl 
of the individual 'choice*. . ^ 

II, Finally we note that quantities of the type 2 log j 

have appeared previously as measures of randomness, pr 
larly in statistical mechanics. Indeed the H in Boltr 
H theorem is defined in this way, being the probabi 

of a system being in cell i of its phase space. Most 
the entropy formulas contain terms of this type. 

■ ■■■■■■■■ - ♦ , "-''-\ 
Tho base which is used in taking logarithms in the for 
amounts to a choice of the unit of measure. v If the base is 
we will call the resulting units "digits;" if the base is t 
the .units will be oallod Halternativps.^ i- One digit is nbou 
alternatives. A' choice from 1000 equally likely possibilit 
is 3 digits or about 10 alternatives. . , 

2. Language as a Stochastic fepcess> 6 v • 

A natural language, such as English, can be studi 
from many points of view — lexicography, syntax* semantics, 
history, aesthetics, etc. The only properties of a languag 
of interest in cryptography are statistical properties. Wh 
are the frequencies of the various letters, of different di 
(pairs of -letters), trigrams, words, phrases, etc.? What i 

the probability that a given word occurs in a certain mossag 
The "cleaning" of a message has significance only in its in- 
fluence on those probabilities. For our purposes all other 
properties of language can be omitted. We consider a langur. 
therefore, to be a stochastic {i.e. a 'statistical) process w 
generates a sequence of symbols according to some system of 
probabilities. The symbols will be the letters of the langu 
together with punctuation, spaces, etc., if these occur. 

Conversely any stochastic process which produces a 
discrete sequence of 'symbols will be said to be a language. 
This will include such cases as: , , , 

1. • Natural written languages such as English, German, Chine 

S % Continuous information sources that have been rendered 
discrete by some quantizing process,:. Tor example., the 
quantized speech from a PCM transmitter, or a quantized 
•television signal* * .. 

3. "Artificial" languages," where we merely defiae abstract 1 
a stochastic process which generates a sequence of symbc 
The following are examples of artificial languages. 

(A) Suppose wo have 5 letters A, B, C, D, E which are 
chosen each with probability .2, successive choicer 
being independent. This would lead to a sequence c 
which tho following is a typical example. 


This was constructed with the use of a table of rar 
numbers,* •.:'<• 

(B) Using the same 5 letters lot the probabilities be 
.4, .1, .2, .2, .1 respectively,. with successive 
choices independent.- A typical "text" in this 
language is thoni . ' ;1^fC> ' ' ^ '.; 

""' ' a A A C D C B D C E A A D A D A C E D A ' 

v . f ; J; 'v i A P CA BE D A D D CE;0 A AAA A D 

■(C) A more complicated structure is obtained "if succesi 
letters are not chosen" independently but their prot 
bilities depend on preceding lottors. In the simpj 

* Kendall and Smith, "Tables of Random Sampling Numbers," 
Cambridge, 1939. 

- 18 - 

case of this type a choice depends only on the 
preceding letter and not on ones before that. The 
statistical structure can then be described by a 
set of transition probabilities p^j), the probabi" 

that letter i is followed by letter The indices 
i and j range over all the letters in the language 
A second equivalent vrny of specifying the structur 
is to give the digran probabilities p(i,j), the re! 
tive frequency of the digram 1 j in the language. 
The letter frequencies pTi), (the probability of 
letter i), tho transition probabilities p^j) and 1 

digram probabilities p(i,j) are related by the foi: 
ing formulas,, , ~ "■• . ~. 

pfi) -3 p(j,,J) -2 p(j,i) ~ Z p(jWlj'- 

' . :. t .J ,,, x y . j ■ 3 : 
; : - P(i) %M J ^^^xl 2|J 
i p 1 (ji -|p(i) - p(i j) * i % 

As a specific example suppose there are three lettt 
A, B, C with the probability tables: 




B C 


,e .2 

i B 



c ; 


.4 a 













i B 








A typical text ^in, this language is the following. 

k ;B A B A BAB B B A C A C A B B A 3 B B 3 A B B 
A> A C B B B A B A \. " 

The next increase in complexity would involve trigr 
frequencies but no more* The choice of a letter wc 
depend on the preceding two letters but not on the 
text before that point. A set of trigram frequonci 


p(i,j,k) or equivalently a set of transition prob: 
bilities Pjj(k) would bo required. Continuing in 

this way one obtains successively more complicate; 
stochastic processes. In the general n-gram case 
a set of n-gram probabilities p(i^, ig, • i n ) 

or of transition probabilities p, , ^ 

1 1 H> Vl 
is required to specify the statistical structure, 

(D) Stochastio processes can also be defined which prt 
duce a text consisting of a sequence of "words. " 
Suppose there are 5 letters A, B, C, D, E and 16 
"words" in the language with associated probabilii 

' .10 A .16 BEBE - .11 tJABED - 3 .04 DEB 

' .04 ADEB • .04 BED . . .05 CEED , »15 DEED 

' .05 ADEE • .02 BEEP - 3 .08 DAB ' V >• 01 EAB 

*: .OX BADD • .05 CA * .04 DAD" v ? i .05 EE ^ 

Suppose successive "words" are cndseii Independent: 
and are separated by a space. A typical message 
might be: 


If all the words are of finite length this process 
is equivalent to one of the preceding type, but t: 
description may be simpler in terms of the word 
structure and probabilities. We may al3o general: 
here and introduce transition probabilities betwee 
words, etc., ^ I, - 

• .>. " i 

These artificial languages are useful in construe 
simple problems and examples to illustrate various posslbil 
V£e can also approximate to a natural language by_ moans of c 
series of simple artificial languages* The aero order appr 
mation is obtained by choosing all letters with the seme pr 
bility and Independently. The first order approximation is 
obtained by choosing; successive letters independently but e 
letter having the same probability that, it does in the natu 
language,. .Thus in the first order approximation to English 
is chosen with probability .12 (its frequency in. normal Eng 
and W with probability .02^'but there is no influence betwe 
adjacent letters and no tendency to form the preferred digr 
such as.TH, .ED, etc. In the second order approximation dig 
structure is introduced. . 'After a letter is chosen, the nex 

one is chosen in accordance with the frequencies with which 
the various letters follow the first one. This requires a 
table of digram frequencies p^(jj, the frequency with which 

letter j follows letter i. In the third order approximatio: 
trigram structure is introduced. Each letter is chosen wit 
probabilities which depend on the preceding two letters. 

3. The Series of Approximations to English 

To give a visual idea of how this series of proce; 
approaches a language, typical sequences in the approximate 
to English have been constructed and are given below* In a: 
cases wo have assumed a 27 symbol "alphabet t ho 26 letter; 
and a space. - " ,., 

1. Zero order approximation {symbols independent and equ: 

probable);-'.-, * •'•^. / ,. ' ' '■, \. ." t 


2. First order approximation (symbols independent but wit 
frequencies of English text). y 


3. Second order approximation (digram structure as in En ( 


4. Third order approximation (trigram struoture as in Eng 


5m 1st Order Word Approximation." Rather than continue wi 
. . • tetragram, n-gram structure, it is easier and bett 

to jump at th^a point to ..word units. Here words are 
chosen independently but with their appropriate fro que 





6. End Order Word Approximation. The word transition 
probabilities are correct but no further structure is 


The resemblance to ordinary English text increase 
quite noticeably at each of the above steps* Note that the 
samples have reasonably good structure out to about twice t 
range that is taken into account. in their construction* Th 
in (3) the statistical process Insures reasonable text for 
letter sequence, but four-letter sequences from the sample 
usually bo fitted Into -good sentences,. . In (6) sequences of 
or more words can easily be placed in sentences without unu 
or strained constructions > Tfio particular sequence of ten 
words "attack on att- English writer that .the charaoter of th 
Is not. at all unreasonably. *»^*** • '-- ^ ^ 

The first two samples were constructed by the use 
a book of random numbers in conjunction for (2) with a tabl 
of letter frequencies. This method might have been continu 
for (5), (4), and (5), since digram, trigram, and word freq 
tables ore available, but a simpler equivalent method was u 
To construct (3) for example ono opens a book at random and 
selects a letter at random on the page. This letter is re- 
corded* The book is then opened to another page and one re 
until this letter is encountered. The succeeding letter is 
then recorded. Turning to anothor page this second letter : 
searched for and the succeeding letter recorded, etc* A si: 
process was used for (4), (5), and (6). It would be lnterc 
if further approximations could bo constructed, but the lab 
involved becomes enormous at the next stage* • , 

The stochastic process 6 is already sufficiently c 
to English for many cryptographic purposes since most crypt- 
analysis is based on "local" structure of not more than two 
three words in length.' . ' ~ 

. - ■ . :; s ; • . 

4*. Graphical Representation of a Markoff Process 

Stochastic processes of tho type described above r 
known mathematically as discrete Karkof f processes and have 
been extensively studied in the literature** $ho general ci 

ysi-: .'A 

* For a detailed treatment see M. Frochet, "Methods des fon 
arbitraires. Theorie des enSnements en chaine dans le ca: 
d'un nombro fini d'etats possibles." Paris, Gauthier-Vill 
1938. ~ 

16 - 

can be described as follows. There exist a finite number c 
possible "states" of a system; S 1 , S g , . .., S n » In additic 

there is a set of transition probabilities; q^j) the probe. 

bility that if the system is in state S ± it will next go tc 

state Sy To make this Markoff process into a language ger. 

tor we need only assume that a letter is produced for each 
transition from one state to another* The states will corr 
spond to the "residue of influence" from preceding letters. 

The situation can be represented graphically as s 
in Figs. 1, 2, 3 and 4. . The "states" are the junction poir. 
in the graph and the probabilities and letters produced for 
transition are given beside the corresponding line. Fig. 1 
for the example B in Section 2, while Fig, 2 corresponds tc 
example C. In Fig. 1 there" ijs only ono stato since success 
letters ere independent* In Fig» 2 there are as many state 
as letters. If a trlgram example wero constructed there wc 

be at most n states corresponding to the possible pairs of 
letters preceding the one being choson. Figs. 3 and 4 shov: 
graphs for the case of word structure in example D. In the 
S corresponds to the "space" symbol. In Fig. 3 each word h 
a separate chain of branches from the left to the right juii 
point, while in Fig. 4 the branches have been combined, sic 
fying the graph. 

5. Puro and Mixed Languages 

As we have indicated above a "language" for our p 
poses can be considered to bo generated by a Markoff proces 
Among the possible discrete Markoff processes there is a gr 
with special properties of significance in cryptographic wc 
This special class consists of the "ergodic" processes and 
shall call the corresponding languages "pure languages." A 
though a rigorous definition of an ergodic process is somev; 
involved, the general idea is simple. In an ergodic proces 
every sequence produced by the process is the same in stati. 
tical properties. Thus the letter frequencies > digram fre- 
quencies, etc.",- obtained from particular sequences will, as 
lengths of the sequences increases, approach definite limit, 
independent of the particular sequence. Actually this is n 
true of every sequence but the sot for which it is false ha; 
probability zoto. Roughly the ergodic property means, stati; 
tical homogeneity, - 

. « - • ••• • / - --iV-r , 

v (' - " . . . 

All the examples of artificial languages given ab 
are pure, the corresponding Markoff process being ergodic. 
This property is related to the structure of the correspond 
graph. If tho graph has two properties the language it gen 
will bo pure. These properties ore: 

1. The graph cannot be divided into two parts A and B su 
that it is impossible to go from junction points in r. 
A to junction points in part B along lines of the gra 
in the direction of arrows and also impossible to go 
from nodes in part B to nodes in part A, 

2. A olosed series of lines in the graph with all arrows 
on the lines pointing in the same orientation will be 
called a "circuit." The "length" of a circuit is the 
number of lines in it. Thus in Fig. 4 the series BEE 
is a circuit of length 4. The second property requir 
is that the greatest common divisor of the lengths of 
all circuits in /the graph be one, : \ - 

If the first condition is satisfied but the secon 
one ( violated by haying the greatest common divisor equal to 
d > 1, the sequences have a certain type of periodic struct 
The various sequences fall into d different classes which a: 
statistically the same apart from a shift of the origin (i.. 
which letter in the sequence is called letter 1) V» By a shi: 
of from up to d - 1 any sequence can be made statisticall 
equivalent to any other. A simple example with d = 2 is th- 
following. There are three possible letters a. b, c. Lettc 
a is followed with cither b or c with probabilities ± and £ 

3 3* 

respectively. Either b or o is always followed by letter a 
Thus a typical, sequence is 

abncacacabacababacac. . 
This typo of situation is not of much importance for our woi 

If the first condition is violated the graph may 1 
"separated" into a set of subgraphs each of which satisfies 
first condition. We will assume that the second condition 2 
" also satisfied for. each subgraph. We have in this case what 
may be called a ''mixed" language made up of a number of pure 
components. . The components correspond to the various subgrc 
If **1» ^3* D:ce ^ ne component languages we may write 

> t - p^ ♦ p^ 2 * p 3 % ♦ *y->f\ 

where p A is the a priori probability of the component langut 

• ■ - j . 

Physically the situation represented is this. The 
are several different languages 1^, 1^, Lj, which are e 

of homogeneous statistical structure (i.o., they are pure 
languages). We do not know a priori which is to be used, bu 
once the sequence starts in a given pure component it cor. 

- 18 - 

indefinitely according to the statistical structure of that 
component. Wo do havo, however, a set of a priori probabilities 
for tho various components, p^, p g , . 

As an example one may take two of the artificial 
languages defined above and assume p^ = .2 and p 2 » .8. A 

sequence from tho mixed language 

L » .2 1^ + ,.8 Lg 

would be obtained by choosing first or Lg with probabilities 

.2 and .8 and aftor this choice generating a sequence from 
whichever was chosen* - 

A natural language, such as English or German, is 
not, of course, pure. Different kinds of text, literary, 
newspaper , technical or military, display consistently differ- 
ent types of structure. Those differences are small, however, 
in comparison with the differences -between different natural 
languages. If only local structure— letter, -digram and trigram 
frequencies, for instance — is of much importance, it is reason- 
able to consider "normal English" to be nearly pure. 

6. Information Rate and Redundancy of a Language 

Suppose we have a pure language L produced by a given 
Markoff process. Associated with the language there are certain 
parameters which are of significance in questions of trans- 
forming the language and in cryptography. The most important 
of these is what we will call the "information rate" R for the 
language. It measures the rate at which the Markoff process 
"generates information," as determined by the measurement of 
the amount of choice available on tho average per letter of 
text that is produced. In Section 1 we deflnod the amount of 
choice when there ore various possibilities with probabilities 
Pl» P 2 i *V, P n as 

H ■ ■ 2 log Pi • 

In a Markoff process with a number of different ^states" there 
will be a choice value ft^ for each of these states and a proba- 
bility of being in each of the states (or a frequency with which 
this state occurs)* If this relative frequency for state i is 
P*, the average amount of choico Is 

R - Z Pi ^ 

summed over all the states. This is tho definition of the 

information rate for the language. If p^(j) is the probability 
of producing letter J when in state i we have 

^ -2 Pi (j) log Pi (jJ 

the sun being over all tho letters in the language. Thus 

R - Z P t Pitj) log p t U) 

Tho infornation rate R has the units of alternatives 
(or digits) per letter sinoe it neasures the average amount of 
choice por letter of text that is produced, 

. A second parameter of importance is. the "maximum rate" 
R Q for the source. This is defined simply as the logarithm of 

the number of different letters in the language. R Q is also 

measured in alternatives or digits per letter. If * successive 
letters are chosen independently and each letter is equally 
likely R Q « R. Otherwise we have R < R Q . 

R and R Q are actually two limiting cases of informa- 
tion rates for the language. R may be said to be the rate 

when no statistical structure is taken into consideration and 
R is the rate when all the structure is taken into account. 
Between these there is an infinite series of rates R* f - R g , 

R Q , ••• which take some of the statistical structure into 

account. R^ takes the letter frequencies into account and is 

defined by 

% « L p(i) log p(i) 

.. - * 

where p(i) is the probability of letter i. R 2 takes digram 
structure into account and is def inod by 

R 2 r -2 p(I)'p 1 (J) log Pl (J) 

where the p(i) are letter probabilities and pjJJ) the ^transition 

probabilities, i»e., tho probability of letter i being followed 
by letter J; In general we define 

*n " Z P<*i» h* W Piifg V d (i n ) 

lOg P ± 4 * (i_) 

X \H *n-l n 

where tho sum is on all indices i, , • i_ and p< • •• . 

1 ^ .'I 1 n-l 

is the probability of (n-1) gram i-^ •*» i^^ with 

p i ^n^ tho I^^abillty of this n-1) gram being folio; 

1 n-1 

by letter i^. ^ may be called tho n-gram information rate fc 
the language. It can be shown that 

. R o> R l> R 2 ^ R oo " R 

These rates determine how much a language /can be "compressed" 
in length by a suitable oncoding process* A language with 
maximum rate Rq and rate R can be transformed in such a way 
that a sequence of letters N letters long is transformed into 
a sequonco of letters only N* letters long where 

IV R A « N R 

(This is approximate and only exactly true in the'limit as 
N -+ oo .) Thus tho information is "compressed" in th6 ratio 


This is the greatest compression ratio possible. It makes use 
of all the statistical structure of the language. If only 
n-gram structure is made use of, a compression ratio 

is the best possible. 

The compression obtained in this way is only a 
statistical gain. Some infrequent sequences are encoded into 
much longer sequences while the more probable ones go into 
shorter sequences so that on the average the length is de- 
creased. It is the type of compression obtained in telegraphy 
by using the shortest telegraph symbol, a single dot, for the 
most froquont letter E, while uncommon letters Q, Z, etc, arc 
encoded into longer telograph symbols. An average reduction 
in time of transmission is obtained but there are possible 
soquencos, e.g., Q Q Q * » t, which require much longer* 
_» ■ ■ • 

Performing 'a transformation on a language L which 
compresses as much as possiblo will be called reducing t to 
a "normal" form. When this has been done it can be shown 
that all letters in the output are equally likely and inde- 
pendent. Actually to realize this transformation would usuall 

21 - 

r>nT TTT IHF1 TTXj "I 

require an infinitely complex machine, but we can always ap- 
proximate it as closely as desired, with a machine of finite 

Tho quantity 

D = R Q - R 

will bo called the redundancy rate of the language. It meas 
the excess information that is sent if sequences in the lang 
arc transmitted in their original form (without compression 
reduction to normal form). Correspondingly thero is a whole 
series of redundancy rates: 

D o - R o - V 
Dp - R, - R ? 

ej x m 

D = R - R 
n o n 

D = R c - R 

is the redundancy rats due to n-gram structure in the 
language . 

The redundancy D can also be said to measure the 
amount of statistical structure in the language. If the se- 
quence is purely random D = whilo at the other extreme if 
each letter is completely determined by preceding letters wit 
no freedom of choice, D has its maximum" possible value R Q . 3 
is sometimos convenient to use the "relative" redundancy D/R c 
which must lie between and 10C#. • ; 


If we hnvo a source of rate R, maximum rate R (bot 
in digits per letter) and consider the possible sequences of 
letters these fall into two groups for N large. One group ol 
"high probability" sequences contains about 



sequencGS (where we have assumed R measured in digits per letter). 
All of those have substantially the same logarithmic .probability. 

The remainder of the total of 10*°* possible sequences are of 
very small probability. In fact thoir total probability ap- 
proaches zero as N increases . The logarithm of the probability 
of an individual sequence in the high probability group is thus 
about -RN. In a procise statement of these results we must allow 
a certain fuzzincss in R, i.e., replace R by R ± e whore e -* 
as N -* oo « 

Reduction of a language to normal form is performed 
by properly matching tho probabilities of sequences to the 
length of the corresponding sequences in the normal form. The 
"high probability" sequences are translated into short sequences 
and tho remainder into longer sequences. 

_ An example will clarify tho results we have given. 

Let the language contain 4 lotters A, B, C, D. In a soquenoe 
successive lotters are chosen independently, the four letters 

having probabilities ^, ^, |, £, respectively. Vie have 
r q m iog 2 4-2 alternatives/letter 


1 11 12 1 
R l * R 2 " % " " R " " ( 2 log t + 4 loe 4 + 8 los 8" } 


* I + I * I ** 4 alternatives/letter 

By a suitable transformation the average length of sequences 

can bo reduced by tho factor ^/2 - 7/8. A transformation to do 

it is the following. First wo translate into a sequence of 
binary digits (0 or 1 ) by the following table 


B 10 

- C 110 

D 111 

After this pairs of the binary digits aro translated into the • 
original alphabot as follows 

00 ' A 1 

01 B» 

10 C» 

11 D« 

- 23 - 

For a typical scquonco this works out as shown below: 

10 110 10 110 10 10 111 111 111 

Regrouping and translation back into letters: 

01 01 10 01 00. 11 01 01 01 11 00 11 10 11 10 
. B« B» C« B» A» V B' B« B» D« A* D« C» D' C 

In this case there are 16 letters in the original and 15 in 
final text. Thus due to the snail redundancy and the short 
of the text only part of tho saving is; evident* . In a long 

hoivever the full reduotion -of g would appear* , This nay be 

verified directly in this cose. In a long text of N letter 
each letter will appear with about its. appropriate* *requenc 
Thus the nuriber of binary digits will be about 

N[| • l + J-2+|«3+^-3] ■ J N 

since each A gives one binary digit, each B gives two, etc. 
nuriber of letters in the final text is half this since each 
pair of binary digits goes into ono letter. Thus the re due 

is by a factor Z . 

It is also easy to seo in this case that the bina 
digits are equally likely and independent, and fron this th 
tho final text letters are also* 

This situation is nore coriplicated for nixed long 
and we shall not enter into it here* Wo nay note, however, 
that if 

L -jpfo* •'»•• ♦ P n Ifc : 

whore 1^ is pure with rate R^ f then the long sequences of 

fall into (n+1) groups^ The first n groups correspond to t: 
pure conpononts. Thpse in gr oup 1 nunber about - 

and have logarnithic probability about 

24 - 

^■'H M, || | | 

Tho last group contains all other sequences and has a snail 
total probability* 

7, Redundancy Characteristic of a Language 

The form of the curve D(N) as a function of N na; 
called the redundancy characteristic of the language. In : 
rough way it describes the way in which the redundancy appt 
In Fig. 5 several types of characteristics are shown, all i 
the same final redundancy. The way in which this approach 
is of importance in cryptography. For languages which reac 
final redundancy at one or two letters (Curves 1 and 2) one 
of cipher (ideal ciphers) can be used. For those which rer 
near zero out to fairly large N (like Curve 5) another type 
appropriate. Natural languages are apt to show a character 
more like 3, and this makes them difficult to encipher witi 
security by simple means. ■ . 

- Examples ; 

1. A language in which successive letters are independer 
but with different probabilities has a characteristic 
Type 1. 

2. Consider a language constructed as follows. First sc 
26 8 different sequences of letters, each 16 letters 1 
from tho 26 16 possible sequences of this length. Th: 
should be a random selection. The 16-letter sequence 
chosen aro the "words" of tho language. Messages arc 
random sequences of those "words." Such a language 1 
a characteristic like the Curve 5, 

3. A language with digram structure only, such as Exampl 
in Section 2 above, has a characteristic of the Type 
Fig. 5, reaching its final value at N = 2. 

4. English has the characteristic 3 in Fig. 5. 


The redundancy characteristic describes how the 
structure in the language is spread out. If the structure 
localized, tho curve rises rapidly to its final value. If 
there are 'long range influences the asymptotic value is ap- 
proached more, slowly. If the structure is "locally random" 
the curve will romain near zoro for small N. 

8. Secrecy Systems 

Before we can apply any mathematical analysis to 
secrecy systems, it is necessary to idealize the situation 
suitably, and to define in a mathematically acceptable way 
what v«e shall mean by a secrecy system. A "schematic" -diagram 
of a general secrecy system is shown in Fig. 6. At the trans- 
mitting end there are two information sources — a message source 
and e key source. The key source produces a particular key from 
among those which are possible in the system. This key is trans- 
mitted by some means, supposedly not intercept ible , e.g. by mes- 
senger, to the -receiving end. The message source produces a 
messnge (the "clear") which is enciphered, end the resulting 
cryptogram sent to the receiving end by a possibly interceptible 
means, for example radio. At the receiving end the cryptogram 
and key are combined in the decipherer to recover the message. 

Evidently the encipherer performs a functional opera- 
tion. If M is the message, K the key, and E the enciphered mes- 
sage, or cryptogrrm, we have 

I - f(M, K) 

i.e. E is r function of M end $« We prefer to think of this, 
however, not as n function of two variables but as n (one para- 
meter) family of operations or trcnsforma tions , and we write it 

E - T,M. . 

The transformation T, applied to message M produces cryptogram E. 
The index i corresponds to the particular key being used. If 
there are m possible keys there will be m transforations in the 
family T g , ...... T ffi , 

At the receiving end it must be possible to recover 
M , knowing E and X. Thus the transform tions in the family 
must have unique inverses 

M - Tf 1 E 

at any rate this inverse must exist uniquely for every E which 
can be obtained from an M with key i. 

The key souroe can be thought of as a "probability 
machine," something which chooses from the possible keys ac- 
cording 'to a system of probabilities. Mathematically then, the 
keys (or the parrmeter of the family of transformations) belong 

26 - 


to q probability or measure spree. Hence we r-rrive rt the 

A secrecy system is o family of uniquely reversible 
transformations T, of r message spree ^ into cryptogam 
spr.ce.Tl_,, the parameter i belonging to a probability CL.. 
Conversely any set of entities of this type will be called a * 
"secrecy system." . . 

The system can be visualized mechanically as a 
machine with one or more controls on it- ' A sequence of letters, 
the message, is fed into the input of the machine and a second 
series emerges at the output. The particular setting of the 
controls corresponds to the particular key being used. Some 
method must be prescribed for choosing the key from all the 
possible ones* 

To make the problem mathematically tractable we shall 
assume that fthe enemy knows the system being used * That is, he 
knows the family of transformations T,, and the probabilities 
of choosing verious keys* 

One might object to this as being unrealistic, in that 
the cryptanalyst often does not know whet system was used or the 
probabilities of vrrious keys. There are two answers to this 

1. The resumption is rcturlly the one ordinarily used 
in cryptogr-phic studies. It is pessimistic and 
hence s-:fe, but in the long run realistic (particu- 
larly in military work), since one must expect his 
system to be found out eventually through espionage, 
captured equipment, prisoners, etc. Thus, even when 
an entirely new system is devised, so thot the enemy 
crnnot rssign rny a_ priori probability to it without 
discovering it himself, one must still live with the 
expectation of his eventual knowledge, • 


2. The restriction Is much weeker thrn appears at first, 
due to our broad definition of what constitutes the 
system. Suppose a cryptographer intercepts a message 
and does not know whether a substitution, transposi- 
tion, or Vigenere type cipher was used* He can con- 
sider this' as being enciphered by e system in which 
part of the key la the, specification of which of these 
types was used, the next part being the particular 
key for that type. These three different possibil- 
ities are assigned probabilities according to his 
best guesses of the a priori probrbilit ies of the en- 
cipherer using the respective types of cipher. 

- 27 - 

cwiui ' mum 

A second possible objection to our definition of 
secrecy systems is that no account is taken of the common 
practice of inserting nulls in a message and the use of mu 
tiple substitutes. Thus there is not a unique E ■ T, M, t 
actually the encipherer can choose at will among a number 
different E's for the same message and key. This -situatic 
could be handled, but would only add complexity at the pre 
stage, without altering any of the basic results. To defi 
the more general secrecy system, one would add a second pa 
meter to the transformations T,, which corresponds to the 
various choices of cryptograms corresponding to a given me 
sage and key. It is possible, but not always desirable, t 
consider this second parameter as part of the key, since i 
does not need to be transmitted to the receiving point. 

We elsO assume that the enemy is in possession o 
measure in the space M , the a priori probabilities of var 
messages. The same object ion"~and essentially tho same ans 
might be given to this assumption as to his knowledge of t 
transformations T*. This measure, however, we do not cons 
rs part of the secrecy system for reasons which wITl apper 
later. The secrecy system whose transformations are T. wi 
be denoted by T and this concept includes the space or. 
which T operates (without its measure ), the trans formation 
r-nd the spaces Ojr and "i^,, the former with its probabili 


If the messages are produced by ? M-rkoff proce? 
of the type described previously, the probabilities of vrx 
messages are determined by the structure of the M^rkoff pr 
For the present, however, we wish to t^ike a more general t 
of the situation rnd regard the messages as merely an abst 
set of entities with associated^. probabilities , not necess' 
composed of a sequence of letters and not necessarily prod 
by a M^rkoff process. 

It should, be emphasized that throughout tne pape 
secrecy system means not one but a set of many transformat 
After the key is chosen only one of these transformations 
used and we might be led to define a secrecy system as a s 
transformation on a language.* The enemy, however, does r. 
know what key was chosen and the "might have been" keys ar 
important for him as the actual one* Indeed it is only tfc 
exi stance of these other possibilities that gives the syst 

*A. A* Albert in a paper presented at a Manhattan, Kansas, 
meeting of the American Mathematical Society (Nov. 22, If 

• entitled "Some Mathematical Aspeots of Cryptography has 
defined a ciphering system in this way. With this limite 
definition about all one can do is to describe and class; 
from the mathematical point of view various types of trar 

28 - 

any secrecy.' Since the secrecy is our primary interest, 
are forced to this rather elaborate concept of a secrecy 
system. This type of situation where possibilities are t 
important as actualities is almost the rule in games of 
strategy. The course of a chess game is largely control! 
by threats which are not carried out. See also the "vir: 
existence" of unrealized imputations "in von Neumann's the 
of games. 

There are a number of difficult epistemologica 1 
questions connected with the theory of secrecy, or in fac 
with any theory which involves questions of probability 
(particularly a priori probabilities. Bayes* theorem, etc 
when applied to a physical situation. Treated abstractly 
probability theory can be put on a rigorous logical basis 
with the modern measure theory approach** As applied to 
reality, however, especially when "subjective* probabilit 
and unrepec table experiments are concerned, there are mar. 
questions of logical validity. For example in the appror 
to secrecy made here, a priori probabilities of various k 
are assumed known by tEe enemy cryptographer — bow can one 
determine operationally if his estimates are correct, on 
basis of his knowledge of the situation? 

It may happen thrt the keys are chosen by the 
cipherer according to one system of probabilities, i.e. c 
measure in the key space 0„ nnd that the enemy cryptanaly 
estimates a second different system of probabilities fl£ i 
this space which ere entirely reasonable in the light e 
his knowledge of the situation — which is correct? I be 
lieve that both correct.' The calculation besed on Clj, 
leads to the solution when the enemy knows just how the 
keys pre chosen r nd the solution .based on ^ leads to sol 
tions which are correct for a situation agreeing with the 
enemy's knowledge of the actual situation. It rppears in 
tuitively that the enemy's lock of knowledge can only do 
him harm, and probably this can be proved, but this quest 
has not been investigated* In fact, we assume only one 
measure ^ in the key spaoe* Similar remarks may be made 
regarding measure in the messrge space Ow. 

*See J» L. Doob, "Probability as Measure," Annals of Math 
Stat .\ v, 12, 194J., pp.*206-2U. 

A.. Kolmogoroff , "Grundbegrif fe der W^hrscheinlichkeits 
Rechnung," Ergebn'isse der Mr.thenetic, v,2, No* 3 (Berlin 
1933). - 

- 29 


Actually In practical situations, only extrec 
errors in P priori probabilities of keys and messages cau 
much error""in the important parameters. This is because 
the exponential behavior of the number of messages, etc, 
and the logarithmic measures employed. 

With regard to the application of the m^ theme 
theory of probability to physical situations there are tv. 
main theories or ways of setting up the correspondence. 
The frequency theory- .Probability is correlated with re 
frequency of an event* .This Is the correspondence used t 
the practicing statistician, in principle by the physic is 
etc. (2) The degree of belief approach. .Probability is a 
subjective phenomena and measures one's degree of belief 
the occurrence of on event* .This approach is seen often 
the work, of historians, Judges, and in everyday life. Al 
though this latter approaoh has of ten been attacked as me 
less we cannot agree with this opinion. In the first pie 
the intuitive approach can be given a rigorous mothematic 
f«tuv4stion» . This has been done in * very elegont way by 
B. 0. Koopmen.* Essentidly one need only assume that a 
be capable of making probability judgments (Event A is m: 
less probable than event B or they are equiprobable) and 
his judgments be self consistent (e.g. if he judges A mor 
probable than B end B more probable than C he should jud£ 
more probable than C). One can even establish numerical 
by the use of a "standard gauge," for example a roulette v, 
and thus relnte the subjective and the frequency probabil 
In the second place, on progmatlc grounds one can hardly 
the subjective applications , since almost all of our ever 
decisions are based on this sort of probability judgment. 
Cryptographic work involves both types of applications, 
the use of frequency tables, significance tests etc., the 
crypt-nalyct is following the frequency approach. In th 
"intuitive" methods of cryptanalysis (probable words etc 
degree of belief approach is more- in evidence* » 

We may remark that e single operation on a 
language which is reversible forms a degenerate type of e 
system under our definition— a system with only one key r 
unit probability- Such a system has no secrecy — the cryi 
analyst finds the message by epplying the inverse of this 
transformation, the only one in the system, - to the interc 
cryptogram* The decipherer and. cryptanaiyst in this case 

*B. 0. Koopman, "The Axioms and Algebra of Intuitive 
Probability," Annals of Mathematics, v. 41, no. 2, 1940, 
p. 269. "Intuitive Probabilities and Sequences," v. 42, 
no.l,. 1941, p. 169. 

- 30 

fiflP r I IT I l 

possess the ssme inf ormation. In gonerr.l, the only differ 
between the decipherers knowledge on3 the enemy cryptanal 
knowledge is that the decipherer knows the pnrticul^r key 
used, while the cryptanalyst only knows the b priori pr->bc 
ities of the various keys in the set. The process of deci 
ing is that of applying the inverse of the particular tr o r. 
formation used in enciphering to the cryptogram. The proc 
of cryptenalysis is that of Attempting to determine the me 
(or the particular key) given only the cryptogram find the 
a priori probabilities of various keys and messages * 

A system will be celled fc^oaed" if any possible 
cryptogram can be deciphered with any possible key. This 
that the inverse transformations T~l are ell defined for e 
element in the cryptogram -spaoe. 1 

7/e shPll use the notation |m| for the "size" of 
message space: ; ../ 

X* • ImI- *•£ P(M) log P(M) 

where P(M) is the probability of message M end the sum is 
all messages of just N letters. Thus \U\ is a function of 
and measures the amount of "choice" in the selection of an 
letter message. F or large N, |M| is approximately RN. 
Similarly Ik] is the size of the key space 

IkI - - 2 P(K) log P(K) 

the sum being oyer all keys. 

9. Representation of Systems 

^ A secreoy system can be represented in various 
One which is convenient for illustrative purposes is a lin 
diagram, as in. Figs. 7, 10, 11. The possible messages are 
represented by points at the left end the possible cryptog: 
by joints at the right. If;a certain key, say key 1, tran 
forms messnge M g into cryptogram E . then M« and E. are con- 
nected by a line ilabeled l f etc» From eacn possible messn 
there must be exactly one line emerging for epch different 


A- second representation is by means of a rectant 
array. This may be done in three different ways* For the 
closed system of. Fig. 7, the three arrays are as follows: 

- 31 - 

M 3 



m\. 1 

E l E 4 E 2 

E 3 E l E 4 

E 4 E 3 E 1 

E 2 E 2 E 3 


M 4 

E» Eo. E 
2 3 4 

. K 







E \ 





M l 


E 2 

M 4 

M 4 

E 3 

M fi 

K 4 

E 4 



transforms % Into E-z and either ?^£Vj t0 E § by key 3 * No 
From the third E 3 is^e^ipherel hi kL Vf^H M 4 ^to S a . 
arrays and the l?ne diagram contain !Lf *? g f V f M 3' A1 * of S these 
any one the others can be derived, equivaleGt informs tion-from , 

' * . . • > • ^ • _ . • *• . 

transform^^in^ describe the set of ^ 

bilities of various ke?s mS; ai« £ pec } fy tlle system the proba- 
by merely listing the kevHftS be eiv f n ' This m W ^ done 
Similarly the melsagl SSbl 1? not Probabilities" 
the probabilities of the va^^^S •^.SSJ* 1 * ^ 

the set oAZsfor^oL 8 W \ e ? 18 t0 desc1 ^ 
forms .on the message for an LhUl^ 8t °P er,2 tions one per- 
grsm. Similarly one d??iJes f X 6 L to ybtr - in the crypto- 
various keys by describing how Tklv £ Probabilities ?™ . 
of the enemy's habits of kJv- ilh««f 7 ^ ohosen, or what we know 
messages are Implicit detL^ The Probabilities tor 

knowledge of tha e^mvL ? ined by stating our a priori 
tion (wflch will Since ^r^nh^^ 3 ' th * ^otToaTSfluB, " 
and any special inSiVwl fi^Es 

. ,«ajr uave regarding the cryptogram. 

10. Notation 



The following notetioa „m generally be followed, 
the encipher&d message or cryDtourr m 

t%Zll&&\Tct nls -S^SSW probabilUlee, . ^ 
SbXi^W* ProbaMlitles. also 4 

3 » the cryptogram space, also a probability space, sine- 
the probabilities in 3L, and induce probabilities 
CL/.for each cryptogram, 

m, ■ the i letter of the message 
e^ * the i'tti letter of the cryptogram 

k^ « the i tn letter of the key when it can be so describe 

Generally P stands for a probability- Conditional 
probabilities are indicated with subscripts; Thus 

P(M.) " probability, of message M 
P(E) ■ probability of cryptogram E 
P(K ) <■ probebility of key K . • 

P M (E) - conditional probability of ,E if message M is chos 
Eg(M) :'.» conditional probability of if cryptogram E is 

intercepted,- i*e# the a posteriori probability of 
• if E Is observed* " " O' , * ■ ■ 
Q * equivocation, a concept to be defined precisely It 
which measures the uncertainty of some ~ knowledge c 
fined only by probabilities. We also hr>ve condit 
equivocations, thus Q^(K) is the equivocation of ■ 
key knowing the message. 
|k| « - L P(K) log P(K) the size of the key space 

\n\ •» - E P(il) log P(M) the size of the message space 

[e| • - E P(E) log P(E) the size of the cryptogram space 

m * number of different keys 
N * number of intercepted letters 
R Q » mr-ximum information rate for a language 

R « mean rate 

JX * R - R ■ redundancy of a language 
T, R, S, etc. ■ secrecy systems 

T*, R»« S,, etc* » particular transformations of these 


11 * 

Some Examples -of Secrecy Systems 

In this section. a number of' examples of ciphers ^ 
be given* These will' often be referred to in the remeinde: 
the paper for illustrative purposes* " ; * ' 

'. " ' ■ 

1. Simple Substitution Cipher. 

'■ \ -,. 

In this cipher each letter of the message is repl 
by a fixed substitute, usually Elso a letter.' Thus the me: 

M *. m^ nig m^ m 4 » . . 

* 33 * 

be cranes 

e l e 2 3 4 

K*S^S«« x'u ?he IbstttuiV AT is the substitut 

for B., etc* " • v . , • .. . » 

2, Transposition {Fixed Period dV • - V 

The is divided into groups of length d-.nd a 

the second group, etc \ r !?* P *??£ first d integers- Thus fc 

that m x m 2 m 3 m 4 a g m 6 nig m 10 oeco 

^ ^ m 5 n 4 m ? ^ * 6 ^ m g ... 4 Sequential npplic* 

tion of two or mor, transpositions will be c.Ued compound 
imposition. If the periods are *1^V 1 Stow d i.< 

thrt the result is a transposition of perioa a, 
the least comon multiple of d g , d 3 , V v 

3. Vigenere, rnd. Variations* ■ 

In this cipher the key consists of a series of d 

A « to Z - 25). Thus 

e^, » <* fc^ i mod 26} J 
where k« is of period d in ithe Index U \f 
For example with the key G A H we obtain 

message N W I S T H E <* , - . 

repeated key G A H G AH G A # * * 

cryptogram _ T D. SANE-*** 

The Vigenere of period \}« •^^"5" xs'alvonced a' 


may be any number from to 25. The so oexxe* o 

- 34 - 

V-ri^nt Beaufort r,re simil r r to the Vigenere, end encipher by 
the equations 

e l * k i - (mod 26) 

e i * m i " k i ^ mod 26 ^ 

respectively. The Be°,ufort of period one is called the 
reversed Caeser cipher. . 

The application of two or more Yigenfires in sequence 
will be called the oompound Vigenere. ' It has the equation 

... * j , 

e i * m i + k l * *i **** * *i ( mod 

' . • • . . . > - ■'«- . .... , , - v .,,.. :- • • 

where 1^, *.., in general have different periods P 

• • •' ' "'>'•■ •' ■ ■■ '■ . n&; '/ • • ■ 

The period of their sum • « 

< . * * * « 

k i + *i + * s i 

as in compound transposition, is the least common multiple of 
the individual periods. 

4. Vernam System** 

When the Vigenere is used with an unlimited key, 
never reperting, we h°ve the Vernam system, with 

e i * m i * k i ^ mod 

the k, being chosen at random and' independently among 0, 1, 
25. If the key is a meaningful text we have the "running 
key" cipher. 

. • ' 

5. Bazeries Cylinder. 

. ,>.'■-■- •• ■„ ; • 'j • • » -v ' ,..«•■< 

In this mechanical system 25 thick disks are used, - 
each having a mixed alphabet stamped around the edge. These 
disks can be arranged in any order on.a spindle,' and the par- 
ticular arrangement used constitutes the key.' With the disks 
in their proper order; a message, is- enciphered by turning the 
disks so that the message appears* on a,. line -.parallel to the 
axis of the spindle* Any. other line of letters may then be 
chosen for the cryptogram. 'To decipher^ the cryptogram is 
arrenged on a line end- the decipherer looks for another line 
which then makes sense. — 

*G. S. Vernam, "Cipher Printing Telegraph Systems for Secret 
Wire' and Radio Telegraphic Communications.'' Journal Ameri. 
Inst, of Elect. Eng., Vj ,'XLVy p#, ! 109-115, 1926. 

6, Digram, Trigram, rnd N-gram substitution. 

Rather than substitute for letters one cnn substi 
for digrams, trigr^ms, etc. Genercl digram substitution i 
quires n key consisting of a permutation of the 26 2 digrar 
It can be represented by a table in which the row correspc 
to the first letter of the digram and the column to the se 
letter, entries in the table being the substitutes (usuall 
also digrams)* 

7* Interrupted Key Vigenere. , 

The Vigenere and its variations can be used with 
interrupted key* • The sequence of key letters is -started e 
at irregularly spaced points* 7 Thus^ if the entire key sec 
isXPGH* TRS> one can Interrupt irregularly to get 

X .P OH F TI H X P Gfi ? lE'XPlPO » • • 

The points of interruption can be determined in various wt 
(1). Whenever a certain letter occurs in the clear »• (£). 
Whenever a certain letter occurs in the cryptogram. (3.) / 
interrupting letter, say J, can be reserved as a signal ar 
the encipherer Interrupts the key at his discretion, (4). 
signal is used end the decipherer loontes the interruption 
by the appearance of meaningless text in the decipherment, 
In place of starting the key again at ecoh. interruption or 
can omit letters of it or reverse the direction of progrer 
There ere many variations and combinations of these methoc 

8. Single Mixed Alphabet Vigenere. 

This is a simple substitution followed by a 


e^ » f (n^) + kj 

• ■ 

The "inverse" of this system is a/Vigenere followed by sir 

e . ■» g(m 4 * k«) 

.1, i i . 

m i r e" 1 (e i } - k i , 



9- Vigenere with Progressing Key* • 

The period of >> Vigenere ean be expanded by ndding n 
fixed number t to the key pt e^.ch pppefrance — thus the n^h group 
is enciphered by the equ-.tion 

e i * m i + k i + nt 

Also this can be vnried by adding t and s alternately to the 
key, etc. 

10. Matrix System** 


One method of n gram substitution is to operate on 
successive n-grams with a matrix having an inverse* The letters 
are assumed numbered^ from to 85, making, them elements of an 
algebraic ring. From the n-gram m, ou r»* m of message, the 
matrix a^j gives an n-gram of cryptogram < . 

' n 

e, • Z a u a, i » 1, *t»,n 

1 j=l 1J J 

The matrix is the key, and deciphering is performed with 

the inverse matrix. The inverse matrix will exist if and only 
if the determinant la^. | has an inverse element in the ring. 

11. The Playfair Cipher. 

This is a particular typp of digram substitution 
governed by a mixed 25 letter alphabet written in a 5 x 5 
square. (The letter J is often dropped in cryptogrephic work- 
it is very infrequent, and when it occurs can be replaced by I.) 
Suppose the iey square is as shown below 


A N U 

RDMIf '? 

K Y.S T S ' 

X B T E W - "•' — - ■ 

* - ' 

*See L. S» Hill, "Cryptography in an Algebreic Alphabet, 1 * 
American Math. Monthly, v. 36, No,. 6 t 1, 1929, pp. 306-312,* 
Also "Concerning Certain Linear Transformation Apparatus of ^ 
Cryptography," v* 38, No. 3, 1931, pp. 135-154,. 

- 3-i - 

The substitute for a digram AC, for example, is the pair c 
letters at the other corners of the rectangle defined by A 
and C, i.e. LO, the L taken first since it is above A. II 
digram letters nre on c . horizontal line as RI, one uses th 
letters to their right DF; RF becomes DR. If the letters 
on a vertical line, the letters below then are used. Thus 
becomes UW. If the letters are the same nulls nay be used 
separate them or one may be omitted, etc. 

12. Multiple Mixed Alphabet Substitution. 

In this cipher there are a set of d simple subst 
tions which are used in sequence. If the period d is four 

m l <m 2 *i ffl 4 m 5 a 6 ,,f 

. ■• ' 


h [m l ] f 2 {m 2 } f 3 (cl 3 ) f 4 (m 4 ) *1 1b 5* f 2 (m 6 } 


13. Autokey Cipher. 

A Vigenere type system in vihich either the messr 
itself or the resulting cryptogram is used for the "key" i 
crlled an eutokey cipher. The encipherment is started wit 
a "priming key" (which is the entire key in our sense) and 
continued with the message or cryptogram displaced by the 
length of the prir4ng key as indicated below with the prin 
key COMET, The message used as "key", 

MESSAGE . S E N D S U P L I E S ... 

KEY -- — - COME 3.8 RiJD S UP 


The Cryptogram us"ed as "key"* ' ; 


KEY . ' t O M E t U S 2 B t H »». 


- 38 - 

14. Fractional Ciphers* 

In these, each letter is first enciphered into two 
or more letters or numbers and these symbols are somehow mixed 
(e.g. by transposition). The result may then be retranslated 
into the original alphabet. Thus using a mixed 25 letter 
alphabet for the key we may translate letters into two digit 
quinary numbers by the table 

12 3 4 
. . L Z Q, C P 

1 AG NO V 

2 R D M I F 

3 K Y H V S 

4 X B TEW , 


Thus B becomes 41. After the resulting series of numbers is 
transposed in some way they are taken in pairs and translated 
back into letters. 

15# Codes. 

In' codes words (or sometimes syllables) are replaced 
by substitute letter groups. Sometimes a cipher of one kind or 
another is applied to the result. 


12 ^ Valuations of Secrecy Systems 

There are a number of different criteria that should 
be applied in estimating the value of a proposed secrecy system 
The more important of these are: ' 

1. Amount of Secrecy. ' 

There are some systems that are -perfect — the 'enemy 
ls-no better off after intercepting any amount of material than 
before* • Other systems, although giving him some information, 
do not yield a unique "solution" to intercepted oryptograms* , - 
Among the uniquely solvable systems, there are wide variations 
in toe amount of labor required to effect this solution; end * 
the amount , of material that must, be intercepted to. make the 
solution unique, - 

- 39- - mJH*H^B£RTE$L 

2. Size of Key.. 

The key must be transmitted by non-interceptible 
means from transmitting to receiving ends. Sometimes it must 
be memorized. It is desirable then to have the key as small 
as possible. 

3. Complexity of Enciphering, and Deciphering Operations. 

These should, of course, be as simple as possible. 
If they are done manually, complexity lends to loss of time, 
errors, etc. - If done mechanically,, complexity, leads to large 
expensive machines. " " v 

4. ; Propagation of Errors. 

In certain types of secrecy systems an error of one 
letter in enciphering or transmission leads to a large amount 
of error , In the deciphered text* The errors are spread out by 
the deciphering operation, c fusing the loss of much information 
and frequent need for repetition of the cryptogram. It is 
naturally desirable to minimize this error expansion.. 

5. Expansion of Message.. 

In some types of secrecy systems the size of the 
message is increased by the enciphering process. This undesir- 
able effect may be seen in systems where one attempts to swamp 
out message statistics by the eddition of many nulls, or where 
multiple substitutes are used. It also occurs in many "conceal- 
ment" types of systems (which are not usually secrecy systems 
in the sense of our definition). 

15. Equ ivalence Clesses In the Key Space 

It may happen that in a ciphering system two or nnre 
different keys, say keys 1,. 2, and 7, are equivalent. -By this 
we meen that for every M ~ J 

■> ■C^ m "- i - . ■ - , . • 

, ' ••' •. ; - > ■ — V ' 

■ . , ' ' ' . , " . ■ Av . ■ ^ ' "■ 

These keys will not be considered as distinct but will be thrown 
into an equivalence class*. It is >clear that the cryptanalyst 
oan never determine whioh particular one of these was used but " 
only {at test) the class.. The probability for the class is of 
course the sam of the probabilities of the different keys in ' : 
the class. - 

As an exemple, in- the Playfair cipher with the s; 
given above, the following are equivalent key squares. 


Z F E C.I JB'Dl.O 


T A V S Q t W B MK U 


We can think of the possible equivalence classes in this c 
as arrangements of a 25 letter alphabet on a 5 x 5 square 
on an oriented torus. The number of different .keys is not 
but 251/5 2 - 241 

• . 

" When vie say that two seorecy systems are the sam 
mean that they consist of the same set of transformations 
with the same message and cryptogram space (range and dome 
and the same probabilities for the different keys (after e 
identical transformations are put in .the same equivalence 

14. The Algebra of Secrecy Systems 

If we have two secrecy systems T and R we cen of 
combine them in various ways to form a new secrecy system 
If T end R heve the same domain (message space) we may for 
kind of "weighted sum," 

S ■ p *T ♦ q 

where p * q - 1. This operation consists of first making 
preliminary choice with probabilities p and q determining 
whioh of T end R is used. This cholse is part of the key 
After this is determined T or R is used ns originally defi 
The total key of S must specify which of T and R is used e 
which key of T. (or R) is used* v 

■ , 
If T consists of the transformations T^. t 1 
with probabilities p v , P m end R consists o=f R, f ... 

R v with probabilities q,„ q k then S « p T * q R cons 

of the transformations Tp, T^ "•— , T , R r , R fc wit^ 

probabilities pp,., pp g , • PP a , qq x » Sfagi • qq k 

- 41 - 

More generally we c^n form the sum of a number 


S = P 1 T+p 2 R+... + p m U Sp 1 - 1 

We note that any system T can be written as a sum of fixed 

T " p l T l + p 2 T S + + p m T m 

Tj being a definite enciphering operation of T correspond!: 
key choice i, which has probability p f « 

A second way of combining two secrecy systems is 
taking the "product", shown schematically in Fig. 8. Supp r 
T and R are two systems and the domain (language space) of 
can be identified with the range (cryptogram space) of R. 
we can apply first R to our language and then T to the resi 
of this enciphering process. This gives a resultant operat 
which we write as a product ' 

S - T R 

The key for S consists of both keys of T and R which are as 
ohosen aocording to their original probabilities and indepe 
ly. Thus if the m keys of T are chosen with probabilities 

p l p 2 p m 
and the n keys of K have probabilities 

p l p 2 p n 

then S has mn keys (at most; there may and often will be 
equivalence classes) with probabilities- p. pl. This type c 
product encipherment is often used; for J example one 
follows a substitution by a transposition or a transpositic 
by a Vigen£re, or applies a code to the text and enoiphers 
jte*, result by substitution, transposition, fractionation, etc» 

k \ - A more special type of product may be defined in 

case both T and R have keys of the 3cme size which may be f 

rw in one-to-one correspondence with the same probabilities fc 

corresponding keys. This may be called the "inner product, 
in oontrast with the above which may be more completely de- 
scribed as an "outer product" (these names are derived froir. 
a rough analogy with the concepts of tensor analysis). In 
the inner product, written 

'\ S m T °R 


- 42 - Q&ffSBEMTtcT 

r.nd indicated scheme tically in Fig. 9, the same key (or corr- 
spending keys) are used for both T end R chosen with the com 

For exr-nple one nay construct e transposition cip: 
whose key is a permutation of the alphabet, each permutation 
being equally likely, and apply first this and then a substi" 
tion based on the same permutation. One also sees this situ: 
tion in certain geometrical types of transposition ciphers 
where the text is written into a square and a permutation ba. 
on a key word applied first to the columns and then the r 
of the square, 

* It may be noted that multiplication (either kind) 

not in general commutative, (we do not always have BS"SB 
although In special cases such as substitution and transposi* 
it is. Since it represents an operation it is def initionall; 
associative. That is R(ST) - (RS) T * RST,. Furthermore we ! 
the laws \ ' ' , ' 

p (p» T+ q' R) + qS * p p' T + p q T R + q S 
(weighted associative law for addition) 

( P R+qS)T-pRT+qST 
(right and left hand distributive laws) 


Pl T + p 2 T + ? 3 R - (p x + P 2 ) T + P 3 R 

Finally with regard to this algebraic structure of 
secrecy operations, we note that every closed secrecy system 
has an "inverse" T 1 obtained by Interchanging the E end M 
spaces, with key probabilities the s*me, and 

\T R S)» - S* R» T* 

(p T + q R)* - P V ♦ q K* % - , 

' ...<_ 

Note that T T' is not in generel the -identity (this is the 
reason we do not write T**+)» . -< 

■■■ y.t: I . . - . . - 

A system whose M and E spaces can be identified, 
a very common oase as when letter sequences are transformed 
into letter sequences, may be termed endomorphic* An endo- 
morphic system T may be raised to a power T n » 

- 43 - 

A secrecy system T whose outer product with itsel: 
is equal to T, i.e. for which 

T T ■ T 

will be called idempotent. For example simple substitution 
transposition of period p, Vigenere of period p (all with e 
key equally likely) are idempotent. 

The set of all endomorphic secrecy systems deflnec 
a fixed message space constitute an "algebraic vrriety," th 
is, a kind of algebra, using the operations of addition and 
multiplication. In fact, the properties of addition and mu 
plication which we have discussed lead to the following res 

Theorem 1: The set of endomorphic oiphers with the same 

message space and the two combining operations 
of weighted addition and ouster multiplication 
from a linear associative algebra with- a unit 
element, apart from the fact that the 
coefficients in a weighted addition must be 
non-negative and sum to" unity* 

It should be emphasized that these combining oper 
tions of addition and multiplication apply to secrecy syste: 
as a whole. The product of two systems TR should not be co 
fused with the product of the transformations in the system 
TjR,, which also appears often in this work. The former T 
is a** secrecy system, i.e. a set of transformations with as- 
sociated probabilities; the latter is a particular trans- 
formation. • Further the sum of two systems p R + q T is a 
system — the sum of two transformations is not defined. The 
systems T and R may commute without the individual T, and R, 
commuting, e.g. if R is a Beaufort system of a given perio 
all keys equally likely, 

R i R 3 * R J R i' 

in general, but of course RR does not depend on its order; 
actually ^ • - 

' -RR > v -vv-r ' ■■ • 

the Vigenere of, the same period with random key* On the oti 
hand, if the individual T. and E, of two systems T and R 
commute, then the systems commute** " \~ \ - 

. i.. .. • > ■ . . • •• - 

It is rather surprising to find an algebraic varir 
with as much structure as a linear associative algebra in w> 


- 44 - 

•the elements have the complexity of ciphers. In Hilbert space 
theory, for example, one has a linear associative algebra, 
but the elements of the algebra are transformations. Here the 
elements are sets of transformations with a probability space 
associated ■ ith the transformation parameter. 

These combining operations give us ways of con- 
structing many new types of secrecy systems from certain ones, 
such as the examples given. We may also use them to describe 
the situation facing a cryptanalyst when •attempting to solve a 
oryptogram of unknown type. He is, in fact, solving a secrecy 
system of. the type 

T P x A + p g B * . . . . + P r S + p* X Z p m 1 

where the &f.B»>*t*i s are known types of ciphers, with the p« 
their a priori probabilities in this situation, and. p f X 
corresponds to the possibility of a completely new unknown type 
of cipher* 

' In weighted r.ddition the key size of the result is 

given by 

= p IK.J + q |K 2 I - (p log p + q log q) 

= p Ik-J + q Ik 2 | ♦ |k 3 I 

i.e. the weighted mean of the two keys plus the size of the 
. p, q key* This is only in case there are no equivalences; 
if there are it will always be less. 

For the outer product the key size is 

Ik II 1^ I ♦ |k 2 I 


with -equality only when there are no equivalences. In the 
inner product 

Ik! < |k x ! - Ik 2 I 

with equality under the same condition. 

45 - 

15. Pure and Mixed Ciphers 

Certain types of ciphers, such as the simple sub 
stitution, the transposition of a given period, the Vigene 
of o given period, the mixed alphabet Vigenere, etc (all 
with each key equally likely) have a certain homogeniety v, 
respect to key* Whatever the key, the enciphering, deciph 
ing and decrypting processes are essentially the same. Thi 
may be contrasted with the cipher 


where S is a simple' substitution and T a transposition of 
given period. In this case the entire system changes for 
enciphering, deciphering and decryptment, depending on whe 
the substitution or transposition was used* 

The cause of the homogeniety %a certain ciphers 
stems from the ^roup property — we. not! oe ' that in the above 
amples of homogeneous ciphers the product of any two trans 
formations in the set T, T, is equal to a third transforme 
T,. in the set, while T 1 ^ 1 J does not equal any transformat 
iB the cipher f 

p S + q T 

which contains only substitutions and transpositions, no 

We might define a "pure" oipher, then, as one wfc 
T* formed a group. This, however, would be too restricti-v 
since it requires that the E space be the same as the M si 
i.e. that the system be end amorphic. The fractional trans 
position is as homogeneous as the ordinary transposition v- 
out being endomorphic. The proper definition is the folic 
A cipher T is pure if for every Tj, Ty T k there is a T g s 

T i V 1 T k - V . 

and every key is equally .likely. ' Otherwise the cipher Is 
The systems of Fig. 7 are mixed. Fig- 10 is pure if all k 
are equally likely. 

r «♦'• - r --- . „i 

Theorem 2: In a pure cipher the operations T. T, which 
transform the message space into itselT form 
group whose order is m, the number of differen 


Y 1 \ V 1 t j " 1 

so that e*iCh element has «n inverse, also the assoeiativ 
law is true since these are operations, end the group 
property follows from 

using our assumption that T, -1 T,' - T . • T- for some s. 

The operation T^-^T^ means, of course, enciph 

the message with key j and then 'deciphering with key i w 
brings us back to the message- spa'oe* , If T is endomorphi- 
i.e. the T, themselves transform the space M into itsel: 
is the case with most ciphers, where both the message sp 
and the cryptogram space- consist of sequehoes of letters 
and the T^' are a group and equally likely, then T is purt 


T i Y T k • T i T r " T s • 

Theorem 3: The outer product of two pure c,iphers which c 
mute is pure. 

For if T end R commute ^ R^ - R^ T m for every i, j with 
suitable £, m, and 

. . ■ . - 

The commutation condition is not necessery, however, for 
product to be a pure cipher* ' 

A system with only one key* a single defini 

operation T^, is pure, since the only 'choice of Indices is 

T l T l" 1 T l * T l* 

Thus the expansion of a general cipher into a sum of such 
simple transformations also '.exhibits it as ft sum of pure 

An examination of the example of a pure cipher 
shown in Fig. 5 discloses certain properties. The message 
fall into certein subsets which we will cell residue clas; 
and the possible cryptograms are divided into correspond!: 
residue classes. There is at least one line from mes 
sage in a class to each cryptogram in the corresponding cl 
and no line between classes which do not correspond. The 
number of messages in a class is a divisor of the total 
number of keys. The number of lines "in parallel" from a 
message M to a cryptogram in the corresponding class is ec 
to the number of keys divided by the number of messages ir 
the class containing the message (or cryptogram)* It is s 
in the appendix th?t these hold in generel for pure cipher 
Summarized in a more formal statement we neve / 

Theorem 4: In a pure system the messages can be divided i 
a. set of "residue classes" C., C 2 , C„ and 

the cryptograms into a corresponding set of 
residue classes C' C' . .., C' with the folic 

The message residue classes are mutually 
exclusive end collectively contain all 
possible messages.. Similarly for the residue classes. 

Enciphering *ny message in C, with any ke 
produces a cryptogram in CI. Decipherir. 
any cryptogram in C! with any key leads 
to a message in C^ t 

The number of messages in C. , say <p. , is 
equal to the number of cryptograms 
in C£ and is a 'divisor of k the number 

of keys. 

Each mrssnge in can be enciphered into 
erch cryptogram in Ci by exactly. JL 
different keys. Conversely qp. . 

for decipherment. 4 



- 48 

The importance of the concept of a pure cipher 
the reason for the nane) lies in the fact that for them & 
keys are essentially the same. Whatever key is used for 
& particulsr message, the a posteriori probabilities of a 
messages are identical* To see this, note that two diffe 
keys applied to the same message lead to two cryp-tcgrams 
the same residue class, say Cj » The two cryptograms ther 
fore could each be deciphered by — keys into each mes.< 


in C. and into no other possible messages. All keys be in, 
equally likely the a posteriori probabilities of various 
messages are thus 

p b im) - hp a&ai _mi 

E P{M) P M {E) " 

where M is in C,, E is in CI and the sum is over all mess- 
in C, .. If E and M are not In corresponding residue classe 
P g (Mr - 0/ Similarly it can be shown that the a posterio: 

probabilities of the different keys are the same in value 
these values ere associated with different keys when a di? 
ent key is used. The same set of values of P E (K) have un< 
gone a permute t ion among the keys. Thus we haVe the resul 

. Theorem 5: In a pure system the a posteriori probability 
of various messeges P~(MJ are independent of t 
key that is chosen* The a posteriori prob; 
bilities of the keys P E (K) are the same in vai 
but undergo a permutation with a different ke\ 

Roughly we may say that any key choice leads tc 
the cryptanalytic problem in a pure cipher. Since tfc 
different keys all result in cryptograms in the same resid 
class this means that all cryptograms in the same residue 
class nre cryptanalytically equivalent — they lead to the s 
a posteriori probabilities of messages and, epart from a 
permutr.tion, the same probabilities of keys. 

As an example of this, simple substitution wit: 
all keys equally likely is e pure cipher- The residue cle 
corresponding to a giTen cryptogram E is the set of all 
Cryptograms that may be obtained from E by ope'rstions T < T 
In this case T . T k ~l is itself' a substitution and henoe an. 
substitution oil E gives another member of the same residue 
class.. Thus if the cryptogram is 


' |'|| | I ■ 





etc. ore in the same residue class. It is obvious in this 
case, that these cryptograms are essentially equivalent. 
AIT that is of importance in a simple substitution with 
random key is the pattern of letter repetitions, the actur 
letters being dummy variables * , Indeed vie might dispense 
with them entirely indicating the pattern of repetitions 
in E as follows:* - 

This notation describes the residue class but eliminates e 
information as to the specific member of the class* Thus 
leaves precisely that information which is cryptanalytical 
pertinent. This is related to one method of attacking sic 
substitution ciphers — the method of pattern words. 

In the Caesar type cipher only the first difft 
ences mod 26 of the cryptogram are significant. Two crypt 
grams with the sane Ae, are in the same residue class. Or. 
breaks this cipher by the simple process of writing down t 
26 members of the message residue class and picking out th 
one which makes sense. 

The Vigenere of period d with rpndom key is a'r. 
example of a pure cipher. Here the message residue class 
consists of all sequences with the same first differences 
letters separated by distance d as the cryptogram. For 
d m 3 the residue class is defined by 

m l " m 4 " e l ~ e 4 
m 2 m 5 " e 2 " e 5 

~ n 6 e 5 " 6 6 r 
m 4 ' "7 " 6 4 " e 7 ( 


^Suggested by a notation used by Quine in Symbolic Logic* 

- 50 - 

where E - e^, e , ... is the cryptogram and m^, m^, ... is any 
M in the corresponding residue class. 

In the transposition cipher of period d with random 
key, the residue class consists of all arrangements of the e. 
in which no e, is moved out of its block of length d, and any 
two e. at a distance d remain at this distance. This is used 
in brisking these ciphers as follows. The cryptogram is written 
in successive blocks of length d, one under another as belo-w 
(d «= 5): 

e l 

e 2 

e 3 


e 5 

e 6 

e 7 

e 8 

e 10 

e ll 

e 12 









The columns are then cut apart and ^rearranged to make sense. 
When the columns are cut apart, the only information remaining 
is the residue class of the cryptogram. 

Theorem 6: If T is pure then Tj_ T* T « T where ' 
T i T j are eny tv, ° tronsform '' 'tions of T. J Conversely if 

this is true for any Tj in a system T then T is pure. 

The first part of this theorem is obvious from the 

definition of a pure system. To prove the second part we note 

first that if T, T." 1 T * T then T, T.-l T is a transforma- 
l j 1 j s 

tion of T. It remains to show th p t all keys are equiprob^ble . 

We have T - E P T and 

s *s i j s s *s s 

the term in the left hand sum with s • j yields 
The only term in Tj on the right is Since all co- 

efficients rrc non negative it follows that 


The same argument holds with i and $ interchanged and 

p j c P l 

and T is pure. Thus the condition th^t T, T. -1 T - T might 
be used ~s an - lti.rn- tive definition of a J pure system. 

- 51 - 

The property of purity in e system is connected vtit. v 
idempotence. Thus consider the system S ■ T T' where T is 
pure. We have 

T i T j" 1 T s V 1 ' T i V 1 T r V 1 " T i V 1 

so th"t the transformations of S are the same ~s those of S, 
■and since both S and S are pure we hrve 

S - S 2 

Theorem 7: If T is pure S » T I' is pure and S 2 * S. 

An endomorphic system T which satisfies the conditi' 
T i T j * T s ^ but not necessrril y with all key probabilities 
equal) can be shown to approach a pure cipher on raising to a 
high power, namely the one with the same trensf ormr-tions , but 
with all probabilities equalized.. In fact the probabilities 
for Tn+1 are derived from those for T^ by a Markoff process, 
of a special type due to the. group property* This special 
type always approaches the limit of equalized probabilities. 
This seme argument applies more generally.' We have 

Theorem 8: Let T be any endomorphic cipher. If T 11 approaches 
any limit at ^11, which will necessarily occur if 
all the transformations of T n lie in a finite set 
(no matter how large n) and the transf arffln tions of 
T include the identity then this limit will be r 
pure cipher. 

As m example consider the cipher 

R = p T + q S 

where T is transposition with random key and S substitution 
with random key. We have 

S 2 = S 


ST ■ T S 


and hence any product of T* s and S?s suoh asTST-TTSS 
reduces to S T. Thus 

R n - p n T + q n S + (1 - p n 1 q D ) S T 

- 52 - 

Ls n 10 the first two terms approach zero find 

Lin R n » S T 

n -*• xi 

The concepts of pure ^nd mixed lnngu-.gts nnd. pu 
and mixed ciphers have an application in practical cryptana 
ysis, if we interpret them somewhat loosely. When a crypt- 1 
grapher starts work on a cryptogram, his first job is to de 
termine the original language. Approximately then he is de 
termining the pure component of the general language space 

L > p x L x + p 2 L z + ... ♦ p n L n 

where say is English, L £ German, etc. Of course these e 

not pure but the different components of them are fairly cl 
together in statistical structure. 

The second thing a cryptographer d~>es is to de 
termine the "type" of cipher that was used — usually this is 
about the same as finding the pure component in the general 
cipher system 

R • P x S + p 2 T + p 3 Y + ... 

where 3 say is simple substitution, T is transposition, etc 
A Vigenere V of unknown period is not a pure cipher but the 

V * Pi V l + P 2 V 2 + *3 V 3 + — 

where V, is of period i, is into puro components (if all ke 
are equally likely for any period). In solving e Vigenere 
the first problem is to determine the period. The same is 
true in transposition. 

The reason for this initial isolation of pure 
«of neerly pure language and cipher is that only then or.n a 
simple meaningful stntistical analysis be carried out. 


16. Involutory Systems 

If every trsnsf orrar: tioh in n systen T is its y. 
inverse, i.e. If 

T i T i - 1 

for every i, the system will be called involutory. Such 
systems are important pr r cticrlly since the enciphering r 
deciphering operations -re then identical. This l«vds t* 
sinplifiod instructions to cryptographic clerks in manual 
oper^ti^n, or in mechanical cases the sane machine with t 
sane key setting nay be usee" for bath ~perctions. 

Examples: In simple substitution we nay limit our trans- 
formations to those in which when letter 9 is 
the substitute for <p, 9 is the substitute for 
.toother example is the Beaufort cipher- 

If T is involutory, so is the system whose ope 
tions are : ^-.;>r : 

■ - . * ' . •" ■ .*• 1 

S S T i s i 

\ - ,* 

since ■ ; . 

17. Similar rnf Weekly Similar Systems 

Two secrecy systems R and S will be s-^id to b< 
similar if there exists ' transf orn- tion /. having en. invc 
A- J- such th^t 


R ■ A S 

This means thrt enciphering with R is the same ps enciphe 
with S ' n.Q then per- ting on the result with the transf or 
tion A. If wo write Rw S to mean R is similar to S then 
is clear thrt R»S implies S^R, Also R« S p nd S» T impl 
R~T and finally R~R. These are sun-prized in mathenati 
terminology by spying that similarity is an equivalence 
relation. * * '/ * 

The cryptographic significance of similarity i. 
if R~S then R and S are equivalent from the cryptanaly 
point of view. Indeed if a cryptanalyst intercepts a cry 
gram in system N S he can transform it to one in system R b; 
merely applying the transformation A to it# /. cryptogram 
system R is transformed to one in S by applying v Arl f If : 
and S ar6 applied to the same language or message space, 
there is f one-to-one correspondence between the rc-sultin 
cryptograms. Corresponding. cryptograms give the same dis 
tribution of r posteriori probabilities for all messages. 

If ~ne h r s r art|p3 of broking the system R the: 
any system S similar to R en be broken by reducing to R 
through application if the -perrti^n A.' This is r device 
thct is frequently used in pr^ctic~l cryptrn" lysis . 

Examples: As r trivial cx^mjle, simple substitution v.herc 
the substitutes ^re n^t letters but ^rbitr^ry 
symbols is similar t? simple substitution using 
letter substitutes. A second ex r mple is the 
Cresar rnd the reversed C^es^r type ciphers. 
The letter is sometimes broken by first trans- 
forming into a Cresar type. The V-igenere, 
Beaufort rn? Variant Beaufort are p11 similar, 
•when the key is random. The "autokey" cipher 
primed with the key K, K g ... K, is similar to • 
Vigenere type with the key .'alternately added an' 
subtracted Lod 86» The %tf nsformrtion A. in this 
case is th^t of "deciphering" the. autokey with 

. a series of d A*s for the priming key.-. - 

* '•-•.'■». .■■>:. .v.... 

Tv,- systems R fn? S are w eakly similar if there 
exist two transformations A an<* B having inverse A'l end 
B-l with 

R - A S B 

This me^ns ttrt system R is the same ~s applying first B 
t^ the language, then S, mc 1 finally A. This rcl^tim is 
r lso nn equivalence relation. 

Finding a method of solution f-^r system R with 
lrngunge L is equivalent t^ finding a solution for S with 
language B L. ■ 

We may note that if R is pure an' S is weekly 
similar t' R then S is pure. This follows from 

R.i Rj- 1 R k - R t 

■ A Si B 
Kfl « B-- 1 Sj 1 A" 1 

\ - A s k B v/ 

where we assume corresponding transformations in R on" S 
t-i h~ve the srme subscripts. Hence 

- 55 - 



R. R - * R. - A S, S. S. B " R 

i °j 

.r 1 r^ b" 1 


anc S is therefore pure* 

* - t 

t •. . 


Theoretical Secrecy 


We now consider problems connected with the "theorecti- 
cal secrecy" of a system. How immune is a system to cryptanaly- 
sis when the eryptanalyst has unlimited time and manpower avail- 
able for the analysis of cryptograms? Does a cryptogram have a 
unique solution (even though it may require an impractical amount 
of work to find It) and if not how many reasonable solutions does 
it have? How much text in a given system must be intercepted be- 
fore the solution becomes unique? Are there systems which never 
become- unique in solution no matter how much- enciphered text is 
Intercepted? Are there systems for which no Information whatever 
is given to the enemy no matter how much text is intercepted? 

18 Perfect Secrecy 

Let us suppose the possible messages are finite in 
number Mi..* M n and have a priori probabilities P{Mi),..., 

P(M n ), and that these are enciphered into the possible crypto- 
grams Ei ,..E m by 

E - Ti M . 

The eryptanalyst intercepts a particular E and can 
then calculate the a posteriori probabilities for the various 
messages, Pe(M) • IT is natural to define perfect secrecy by 

the oondition that for all E, the a_ posteriori probabilities are 
equal to the a priori probabilities independently of the .values 
of these, In~~tnis case, intercepting the message has given the 
eryptanalyst no information** Any action of his whioh depends 
on the Information contained in the cryptogram cannot be altered, 
for all of his probabilities as to what the cryptogram contains 
remain unchanged*- f On the other hand, if the condition Is not 
satisfied there will exist situations' in which the enemy has cer- 
tain a_ priori probabilities, and certain key snd messages are 
chosen where the enemy^ probabilities do .change* This in turn 
may effect his actions and thus perfect secrecy -has not been . . , 

— «•.' *» ^ «• «• — «► «• — -* a» _ ■» f •» — a» . a* •» 

*A purist might object that the enemy has obtained a bit of infor- 
mation in that he knows a messsge was sent. This may be answered 
by kJ having among the messages a "blank" corresponding to "no mes- 
sage t fl If no message is originated the blank is enciphered and 
sent as a cryptogram,, Then even this modicum of remaining infor- 
mation is eliminated, 

obtained. Hence the definition given is necessarily required by 
our ideas of what perfect secrecy should mean. 

A necessary and sufficient condition for perfect sec- 
recy can be found as follows.- We have by Bayes' theorem 

t> P(M) ^ (E) 
P-r M - ■ 

* P(E) 

> ■ 

and this must equal P(M) for perfect secrecy, Hence either 
P(M) * 0, a solution that must be excluded since we demand the 
equality independent of the values of P(M) , or ; ; 

- ' ) ; -,p(e) . ■ 

for every M and E» Conversely if ^(E) - P(E) then 
and we have perfect secrecy* Thus we have the result: 

■ . 

Theorem- 9; A necessary and sufficient condition for 
perfect secrecy is that 


P M (E) - P(E) 

for' all M and E. That is Pjj(E) must be 
independent of K, 

The probability of all keys that transform M« into a given crypto- 
gram E is equal to that of all keys transforming if* into the 
same E. 

Now there must be as many E's as there are M T s, since 
fixing i, Tj gives a one-to-one correspondence between all the 
M T s and some of the E»s . For perfect secrecy Pvr(E) « P(E) ^ 
for any of these E»s and any M. ■ Hence there is at least -one key 
transforming any M into any of these E*e, But all the keys from 
a fixed M:to different E's must be different, and therefore the' 
number of different keys, is at least as great as the number of 
M»s* It is' possible to obtain' perfect, secrecy with no more, »s 
one shows by the following example* . I,et the be numbered 1 to 
n and. the E^ the same > and using n keys let 
_ - ^ ■* >:?:**,:■ <■ * *f 'f'*t'%«.. .: . ■ . •' •' rj**?* ' ' - 

where s ■ i +>j (Mod nj . • In this^case we see that P~(M) » — » P<E) 
and we have perfect secrecy.' An example is shown 
with n « 5. • 

- 58 - ooaam^mj 

These perfect systems in which the number of crypt 
grams, the number of messages r and the number of keys are al 
equal are characterized by the properties that (1) each M is 
connected to each E by exactly one line, (2) all keys are eq 
likely. Thus the three matrix representations of the system 
"latin squares". 

We have then concealed completely an amount of inf 
tion at most log n with a size of key log n. This is the fi 
example of a general principle which we will often see, that 
there is a limit to what can obtain with a given key size— t 
amount of uncertainty we can introduce into the solution of 
cryptogram cannot be greater than the key size* Here we hav 
concealed all the information but the ke*y size is as large a 
message space* . 

We now consider the case where lM| is infinite; in 
suppose the message generated as an unending sequence of let 
by a Markoff process* The maximum rate of this source is R c 
It is clear from our results above that no finite key will g 
perfect secrecy. We suppose then that the key source genere 
key also in the same manner, i.e. as an infinite sequence or 
bols with a mean rate R K . Suppose that only a certain lengt 
key Ljc is needed" to encipher and decipher a length of mes 

Theorem 10: For perfect secrecy (when the a priori proba- 
bilities of various messages can be anything) , 
for large L 

Ro L M < % 

and the rate (R R * e) is asymptotically 

This may be provSd by the same method (essentially 
the finite case. This case is realized by the Vernam systet 

These results have been deduced on the basis of un 
or arbitrary a. priori probabilities for the messages* The k 
required for perfect secrecy depends then on the total numbe 
possible me s sages j 6? on the maximum rate Bo ° f the' message 

source. * - •'. 

" ~* ' - one would suspect that if the message space has fi 

known statistics; so that it has a definite mean rate R of 
generating information, th<3n the amount of key needed could 
reduced in an average sense in just this ratio JL» end this 


indeed true. In fact the message can be passed through a ti 
ducer which transforms it into a normal form and reduces the 

- 59 - 

expected length in just this ratio, and then a Vernem syst- 
may be applied to the result. Evidently the amount of key 
per letter of message is statistically reduced, by a factor 


— and in this case tho key source and information source 

just matched--an alternative of key conceals an alternativ 
information. It is easily seen also, by the methods used : 
"Information* paper that this is the best that can be done. 

K Theorem 11; 'Perfect secrecy (omitting the condition of 
independence of a_ priori probabilities) for 
. a source with fixed statistics and a, rate 
R of generating Information can be' 'achieved 
with a key source which generates at the 

rate (R + e) where W and Lv are message 

„ • - _ «• ** 

L K 

and key lengths^ which correspond. ; A rate 
less than R iM. is insufficient.: 

% ' - 

Perfect secrecy systems have a place in the prac- 
picture — they may be used either where the greatest import 
is attached to complete secrecy — e.g. correspondence betwe. 
the highest levels of command, or in cases where the numbe: 
possible messages is small. Thus, to take an extreme exam; 
if only two messages "yes" or "no n were anticipated a perft 
•system would be in order, with perhaps the transformation - 










The disadvantage of perfect systems for large co: 
pondence systems is,' of course, the equivalent amount of ke 
that must be sent. In succeeding sections we consider what 
be achieved with smaller key size, in particular with fini- 

19. Equivocation 

Let us suppose that's simple substitution' cipher 
been used on English text and that we Intercept a certain t 
N letters, of the enciphered text. For N fairly large, mo: 
than say 50 letters, there is nearly always a unique solut: 
the cipher; i.e. a single good English sequence which tram 

- 60 - SpjffffifflffiCI&Li 

into the intercepted materiel by a simple substitution. W: 
smaller N, however, the chance of more than one solution is 
greater; with N * 15 there will generally be quite a numbe: 
possible fragments of text that would fit, while with N = E 
good frecteon (of the order of 1/8) of all reasonable Engl: 
sequences of that length are possible, since there is seldc 
more than one repeated letter in the 8. With N «* 1 any let 
is clearly possible and has the same a posteriori probabili 
as Its a priori probability,. For one^letter the system is 
feet, ~ 

This happens generally with solvable ciphers. Be 
any material is intercepted we can imagine the a^ priori pre 
bill ties attached to the various possible messages, and a Is 
to the various keys. As material Ik Intercepted, the crypt 
lyst calculates the a posteriori probabilities; and as N ir 
the probabilities *>f*""certa in messages • increase * and of most 
decrease, until finally only one is left ^ which has a probe 
nearly one, while the total probability of all others is ne 
zero, - : r. 

This calculation can ectually be carried out for 
simple systems. Table 1 shows the a . posteriori probabiliti 
for a Caesar type cipher applied to English text, with, the 
chosen at random from the 26 possibilities. To enable the 
of standard letter digram and trigram frequency tables the 
has been started at a random point (by opening e book and p 
a pencil down at random on the page). The messege selectee 
this way begins "creases to • , ," starting inside the wore 
creases. If the message were to start with the beginning c 
sentence a different set of probabilities must be used, cor 
ponding to the frequencies of letters, digram , etc,, at t 
beginning of sentences, ./.„.■ 

The Caesar with random key is a pure cipher and t 
particular key chosen does not affect the a posteriori prot 
bilitles; To determine these we need mereTy list the possi 
decipherments by all keys and calculate their a priori prob 
bilitles* The a posteriori probabilities are Ehese divided 
their sum; These possible decipherments are found by the 
standard process of "running down the alphabet" from the me 
and are listed at the left* These form the residue olass f 
the message. For one intercepted letter the a posteriori p 
bilitles ere equal to the a_ priori probabilltres for letter, 
are shown in the' column- headed N f s 1, For two intercepted 
letters the probabilities are those for digram adjusted t 
sum to unity and these are shown in the column N * E. 

- 6i - aaffflft 

Table 1 

A Posteriori Probabilities for a Caesar Type Cryptogr 


N = 1 

N - 2 

N - 3 

N - 4 


• 032 





, .036 





/ • 

F U H D V 

, .023 


G V I E W 

. .016 


H W J F X 


- .015, 


I X K G Y 





K Z M I A 

. .005 

L A N J B 

. .040 

. ,072 

. .250 



, .020 


. .022 

. *.oi 

N C P L D 

. ,072 

4 ,066 

D % M E 

. .079 

V .034 

P E R N F 

, ,,023 

, .085 

. #438 

a n 

. -#43 

Q F S G 

. „002 


. .060 



• .066 


. .005 

T I V R J 




U J W S K 

. .030 

V K X T L 

. .009 

W L Y U M 

. .020 



X M Z V N 





Z B X P 


A P C Y Q 


. .066 

B Q D Z R 


Q, (digits) 



. .602 


Trigram frequencies have also been tabulated and .these are 
in column N *.3. For four and five letter sequences probe 
, ties were obtained by multiplication from trigram t re quenc 
since approximately " ,\ '.. Vv^w.-'-- 

•v- • 

p{ijki) --p(tjk) P Jk (^) 

■ **- ■ -> . --. ■ 


- 62 


Note that at three letters the field has narrowe 
to four messages of fairly high probability, the others bei 
snail in comparison. At four there are two possibilities 
five just one, the correct decipherment. 

In principle this could be carried out with any 
but unless the key is very small the number ° f jg""^ 
so large that the work involved prohibits the actual caicu 

This set of a posteriori probabilities describes 
the cryptanelyst's knowledge of the message and key g re due 
becomes P more precise as enciphered material is obtained 
description, however; is much too involved and difficult t 
obtain for our purposes. What is desired is a simplified 
caption of this approach to uniqueness of the possible sc 

We will first define a -quantity Q called the "ec 
vocation" which measures in an average way ^.^J* 8 "*; 
the solution, or How far it is from unicity. Suppose tha ; 
celtl in cryptogram E ,of N letters has been intercepted. . 
c?yptaSa^st III in principle calculate the a posteriori , 
Mlities by the use of Bayes' theorem..- Thus 

P^M) « P(M) P M (E)/P(E) 

Similarly the probabilities for various keys, after E has 
intercepted are given by 

P2(K) - PlK) Pk(E)/?(E) 

The equivocation of the message should measure 
way how -spread out these probabilities P E (M) are; how far 
are from being concentrated at one message. In Xio* with 
General principles of measuring such dispersion, as in th 
Srhnioe uncertainty, and generating Information, we de 
He Equivocation or tU messfge when E has been intercept 

... ■ ■■ ....... 

•v^-v^-. , ■ ^ (M) m j. pg(M) log' Pe(M) 


the summation being over ell P 05 * 1 * 1 ^* 3 !f ^ven^ 1 * 1 " 1 
equivocation in key when E in intercepted Is given *y 

q(K) - - T P E (K) log Pe(K) 

The same general arguments used to justify our me 
of information rate may be used here, to justify the equivc 
measure. We note that equivocation zero requires that one 
sage (or key) have probability one, all others zero. Equi\ 
is measured in the same units as information, i.e. alterna' 
digits, etc., according as the logarithmic base is 2, 10, c 
In fact, equivocation is almost identical with information, 
difference being one of point of view. In information we £ 
the notion of how much freedom we have in choosing one eler 
from a set with certain probabilities — in equivocation we t 
size the uncertainty of our knowledge of what wss chosen wt 
probabilities have certain values. 

Although any one number can hardly be expected tc 
cribe the set P E (M) perfectly for all purposes, I think the 
defined here does as well as any single statistic can* Sor. 
the theorems which follow indicate the mathematical "naturt 
of this particular measure. 


The values of equivocation for the Caesar type c: 
gram considered above have been calculated and are given ir 
last row of Table 1. This is the Q, for both key and messaf 
the two being equal in this case. 

The definitions given above involve 'a particular 
cepted E, and ore the equivocations for that intercepted c: 
gram. We wish, however, to find a measure of the equivocf 
for the system as a whole, which will describe this progre: 
toward uniqueness as N increases in an average sort of way. 
To do this we form a weighted average of the equivocations 
each particular intercepted message E, weighting in accord; 
with the probabilities of getting the E in question. This 
be called the mean equivocation of the system, or where ttu 
is no chance of confusion with the narrower equivocation fc 
particular E, we abbreviate to merely the equivocation. T: 
mean equivocation of message is 

Q(M) - - T P(E) Pe(M) log Pe(M) 
/ M,E 


the summation being over all M and all E. Since 

P(E) Pg(M) - P(E, M) 

the probability of getting both E and M, we can write this 


Q(M) - - T P(M,E) log P E (M) - - 2 P(M,E) log P(M) 


- 64 - tuiiiii 1 1 milium m 


Q(K) - - Z P(K,E) log P(K) -f— . 

Either of these mean equivocations is a theoretics 
measure of the secrecy value of the system. We ssy theoreti 
since even when the equivocation is zero, which corresponds 
no uncertainty as to the message , it may require. e tremendou. 
amount of labor to locate the particular message where the p 
bility is one. It might, for example, be necessary to try e 
possible K in succession until one was found that trensforme 
the intercepted E into reasonable text in the language. Thu 
system would be practically very good, but theoretically sol 
The equivocation may be said to measure the degree of secrec 
when the cryptanalyst has unlimited time and energy. 

The equivocation is, of course, a function of N, t 
number of letters intercepted. The functions Q(K,N) and Q,(M 
will be called the equivocation characteristic* of the syste. 

Th3 following data will be helpful in forming a pi 
of what small values of equivocation represent. 

An equivocation of .1 alternative would result if 
9 times in 10 there was no uncertainty as to M, the tenth ti: 
two M*s were equally probable, or (2) if every time there we 
two possibilities one with probability .983, the other with 
probability .017, or (3) if 99 times in 100 there W3S no unc 
tainty, the 100th tine 1000 equally likely possibilities. 

An equivocation of ,01 would result <1) if every t 
there were two possibilities one with probability .999, the 
with probability .001, or (2) if 99 times in 100 there is no 
certainty, the other time two equally likely possibilities, ; 
(3) if 999 times in 1000 there is no uncertainty, the other t: 
6 or 7 equally likely possibilities* 

* ■ v -.■■-* 

- - '* x 

20, Properties of ^Equivocation 

Equivocation may be shown to have a number of inte: 
esting properties* most of which fit Into our intuitive pict 
of how such a quantity should behave* We may first show, by 
example, the somewhat surprising fact, that after a cryptena. 
has intercepted certain special- 'E*a, his equivocation as to ! 
or message may be greater then before he intercepted anythin, 
The Intercepted material has increased his ignorance of what 
happenedl Suppose there are only two messages and Mg wit; 
a priori probabilities p end q f and that a simple substituti 


is used according to the following table, the two keys K± and K 2 
also having the e_ priori probabilities- p and q. 


K 2 

E 2 


M 2 

E 2 

Before the interception, the equivocation of both key and message 
is - (p log p ♦ q log q), which is less than one alternative if 
p 4 q. If p » q there is little uncertainty as to which message 
and key will be chosen, Mi and Now suppose he intercepts 

The a posteriori probabilities of both keys and both messages are 
easiTy seen to be l/Z. and hence the equivocation for both key 
and message is one alternative, greater than before.' On the other 
hand, if Eg is intercepted, the more probable event, the equivo- 
cation for both key and message decreases, more than enough to 
compensate for the other increase, and the mean equivocation of 
both key and message decreases. This is a general property of all 
secrecy systems. 

The mean equivocation of key, Qk( n ) iB a non-increas- 
ing function of N. The mean equivocation of the 
first A letters of the message is a non-increasing 
function of the number N which have been intercepted. 
If N letters have been intercepted, the equivocation 
of the first N letters of message is less than or 
equal to that of the key. These may be written 

Theorem 12: 

Qm(m) < Qm(N) 
Qu(N) < 

S > N 
M > N 

The qualification regarding A letters in the second 
result of the theorem is so that the equivocation will not be 
calculated with respect to the amount of message that has been 
intercepted^ If it iB; the message equivocation may lend usually 
does) increase for a timej due merely to the fact that more 
letters stand for a larger possible range of messages* The 
results of the theorem are what we might hope from a good measure 
of equivocation, since we would hardly expect to be worse off on 
the average after intercepting material than before-. The fact 
that they can be proved gives additional justification to our 

- 66 - 

The results of this theorem can be proved by a sub- 
stitution in the property 6 of section 1» Thus to prove the 
first or second we have for any chance events A and B 

Q,(B) > Q A (B) 

If we identify B with the key (knowing the first S letters of 
cryptogram) and A with the remaining N - S letters we obtain 
the first result. Similarly identifying B with the message 
gives the second result. The last result follows from 

Q(M) < Q(K) * Qg(M) . \ 

and the fact that Q K (M) * since K uniquely determines M. 

Theorem 13: Q,(K) - JM| ~ }E| + jK| 

Q(M) « fM | - |E|.+ |Hf 


- - I P(M,E) log . 

We have 

q(k) - - r 


P(K) P K (E) 


'Q(K) - - 2 P(K) P K (E) log P{K) - r P(K) Pk(E) log, P K fE) 

, + r P(K) P K iE) log P(E) 

Summing the first term on E gives - 1 P{K) log P(K) ~ 

In the second term PviE) is P(M) t the unique M that gives E 

with key K. Summing on K then gives - T P(M) log P(M) - |M|. 
The third term is 2 P(E) log P(E) - |EU 

- 67 - 

«iJ!JlfiuJlL 1 

The. second equation in the theorem is proved by the 
same method. 

Q(M) - - Z P(E) Pe(M) log Pe(M) 

- - I ?(«) *(» log F(M) 


« - Z ?(M) F M (E) log P(M) - Z P(K) Pm(E). log P M (E) 
' + Z P(M) P M (E) log P(E) ' : 

- |M| - |S| - T P(M) P M (E) log, Pm(EJ ' 

The last term here interpreted as follows* Group to- 
gether 811 the different keys that transform a fixed M into 
the same E, giving the total probability to the group, which - v . 
will be %(E) . The last term is the average size of this group 
space weighted according to the probability P(M) of choosing 
among the groups leading out of M. In case no group contains 
more than one element (at any rate no group from a M with 
P(M) > then |H| * |K| and q(K) - Q,(M) . This is also clear 
since there is then a one-to-one correspondence between the 
keys and messages for any given E. 

From the first equation of the theorem we may conclude 
that Q(K) - |K| in case |M| - fEj . This latter occurs in par- 
ticular if all L''s ere equally likely and all E»s equally likely 
and there are the Same number of each. It is easy to see that 
this is the case with a language in which every letter is equally 
likely and independent, ond when almost any of the simple ciphers 
are used. 

If we have a product system S s T R, it is to be ex- 
pected that the second enciphering process does not decrease 
the equivocation of message and thiq Is actually/true as C8n 
be shown by the methods used /above* If T end R commute either 
may be considered as being the first and hence in this" case . 
the equivocation with S is not less than the' maximum for the, 
two systems R and T, Simple examples' show that this does not ' 
hold necessarily if R and T do" not commute, \\ 

Theorem 14; The equivocation in message of a product 
system S » T R is not less than that when 
only R is used. If T R - R T it is not less 
than the maximum of those for R and T alone. 

68 - 

If we hove a product of several systems R S T U, we 
con of course extend this, to sey that the equivocation of 
R S T U is not less than that of S T U, which is not less than 
that for T U, etc 

There is no similar theorer.: for the inner product since 
for example if T and R are inverse processes their inner product 
is the identity and the resulting equivocation zero. 

Suppose we have a system T which can be written as a 
weighted sum of several systems R, S, U 

T - p x R + PgS + ♦ + PmU I Pi - 1 

1 .\- - ■ 

and that systems R, S, U have equivocation characteristics 

Qi, Qe %l* • . ' ■ ;' ' 

Theorem 15: The equivocation Q of a weighted sum of 
systems is bounded by the inequalities 
2 PiQi < Q < 2 PiQi - I Pi log Pi 

These are best limits possible. The Q»s may refer either to 
key or to message, . 

The upper limit is achieved, for example, in strongly 
ideal systems (to be described later) where the decomposition 
is into the simple transformations of the system. The lower 
limit is achieved if ell the systems R, S, .. t) U go to com- 
pletely different cryptogram spaces. This theorem is also proved 
by the general inequalities governing equivocation, 

Q A (B) < Q(B) < Q(A) ♦ Q A (B). 

We Identify A with the particular system being used and B with 
the key or message, • 

There Is a similar theorem for weighted sums of 
languages, ■ v "■ 

Theorem 16: Suppose a system can be applied to lenguages 
• , ••* ^i# L 2 ». •♦•> L m Qn<l has equivocation cha,rac- 

, teristics Q^.* Q-2» ^m* When °PP lied t0 

the weighted sum ? Pi Li, the equivocation Q, 
is bounded by 

2 Pi Qi £ Q £ 1 Pi^i " 1 Pi log p i 

- 69 - 

These limits are the best possible end the equivocations i 
question can be either for key or message. 

The proof here is essentially the 'same as for th 
preceding case. 

An important consequence of the result 
Q(K) « iKf + |Ml - JE| 

is the following,' 

, . ..«'. *~ • 

Theorem 17;* In any closed system, or any system where 

-. <. " the total number of possible cryptograms is 
. ' ; equal, to the number of possible messages" 

• of N letters Q(K) > \K] - < fM 1 - }M|) •* |K] • 

' L v * i " : where M » log H, with H the number of pos- 
- - , ' : : ■>-.■.'•'.;-. sible messages of N letters." Dm is the total 

redundancy for N letters,' 

This is true since |M | > [Ef, the equality hold 
only if all cryptograms are equally likely. 1 The theorem s 
that in a closed system the key is determined only by the 
dundancy of the language - the equivocation can decrease o 
es the redundancy comes into action and at no greater rate 

Suppose we have c pure system and let the differ 
residue clesses of nassoges be Ci., C% r C r , The co 

ponding set of residue classes of cryptograms is C^,.. 

The probability of each E in is the sane: ; 
' Where is the number' of different messages in Thus ; 

: , - «-z p(Ci) log' - ' 

P(E) « 2i££i E e C, 

70 - 

Substituting in our equation for Q, we obtain: 
Theorem 18: For a pure cipher 

Q - \K\ + (Hj ♦ I P ( Ci ) log 

This result can be used to compute Q, in many cases of inte 

From the analytic point of view pure ciphers hcv 
simple structure. If a cryptogram is intercepted its resi 
class gives the complete information obtained by the crypt 
Within the residue class the system is perfect - each mess 
in the class has an a posteriori probability equal to its 
a priori probability? For large N. beyond the unicity poi 
There will usually only be one M in the class of reasonabl 
probability., and the -problem is to determine this M. 

The theorem oh equivocation of pure' ciphers can : 
altered to show this. We have 

iptCi) log ZllLL « z p(ci) log p(c i ) -i p(Ci) log ^- 
<?i V1 

+ Z ViCi) log k 

- Z PtCiJ log P(Ci) + Q M (K) - |K| 



P(C< ) 

Q (K) - |K| + |M| + Z P{C, ) log i- 

" |*| ♦ Q M (K) + I P(Ci) log P(Ci) 

Q <M) '■' - |M| - [-Z P(C t ) log HCil 1 

The equivocation of message is the equivocation of message 
the cryptogram was intercepted less the information imparte 
specification of its residue class, ; . * " : ■ 

SI. Key Appearance Characteristic 

Suppose the cryptanalyst has N letters of message 
and N letters of the equivalent cryptogram. Then he can ca3 
cul.ate the a posteriori probabilities of the various keys or 
the basis of this information, and if N is small there will 
remain a certain equivocation of key* For example in simple 
substitution, knowing 20 letters of message and cryptogram 
does not disclose the entire key, since only about 12 letter 
of the 26 will be represented, • Thus there is a residual 
equivocation of log (26-12);, if exactly 12 letters appear. 
We define the mean residual key equivocation as 


. , / : . •• „•• ; , r ;- : " 

when P(E,M) is the a priori probability of having message M 
and cryptogram E, and Pg^fK) is the conditional probability 
of K with S and M given* 

This may be written by obvious arguments (assuming 
all keys equally likely) 

%(K)- % P(M,K) log X (M,K) 

where X (M,K) is the number of different keys from M in para 
with K, that is which go to the same E as K. 

For simple substitution let P* be the probability 
that a received cryptogram of N letters has X different lett 
appearing in it. Then 

%(K) * £ P x log (26 - x)j 


log lbgV ^26A) 

, r 

The bracketed terms vary slowly wifcfc atfd it P&) is fairly 
well concentrated, we may take the bracket' out" replacing X 
by its mean value Xjv This gives,- after recombination 

- 72 

Q M (K) » log (26 - 

This residual key equivocation is shown for simple substi- 
tution on English in Fig; 12, It measures how much of the 
key has not been used in enciphering N letters of text on 
the average, 

Theorem 19: QjX) - Q(M) ♦ ft^K) 

That is, the total key equivocation (when we don't know the 
message) is the sum of the message equivocation and the re- 
sidual key equivocation; lie;; the equivocation there would 
be in the key if we did know the message; This follows from • 
the fact that the key uniquely determines the message 
properties 4 and 5 in Section X» ■ * . 

22. Equivocation for Simple Substitution on an Independent 
., tetter Language . • ■ 

We will now calculate the mean equivocation in key 
or message when simple substitution is applied to a two 
letter language, probabilities p and q for and 1, with 
successive letters independent; We have 

% " % " - 2P E P JS lK) log P S lK) 

The probability that E contains exactly s O's in a particular 
permutation is 

1 , s n N-s . s N-s, 
g- (P q • ♦ 0. P ) 

and the a posteriori probabilities of the identity and in- 
king substitutions are respectively 

ver ting 

p a q»"» p 1 ^ 8 q 9 

hM m 177^ ♦ , 8 p^ 8 ) V? * EFT* ♦ >*; 


There are („) terms for each 8 and hence 


This may be written 

Q(N) = -Z p S q^ 3 [s log p + (N-s) log qj 

, / s N— s s N-s i 
- log (p a q p^ a ) 

- -N [p log p * q log q] *■ Z (*) p S q 1 ^ 8 log (p S q lN " s q £ 
« MR + iz < N ) (p S q N ~ S * q S p 1 *" 3 ) log (p S q N - s * q S p 1 ^ 

For p = 1/3, q = 2/3, and for p * 1/8, q - 7/8, Q, has beer 
culated and is shown in Fig. 13, 

Now assume the language contains r different 
letters chosen independently and with probabilities p, , 
p £****» p r* By approximately the same argument we have 

1 2 T> "l 

Q(N) - -Z {s x ...8 T ) p x p 2 ..*p r r log -r± 

S l ! 

3 P. S * _ Pi "»P r 

S l f 

Zp •••P T1 

s, ... s r a r\ 

± T p 

where Z s. » N and Z is over all permutations of 1, 8, ... 
for a, tw v 

Hence, by obvious • transformations 

Q(N) m * £ Z U r 5UjJ 2 Pa^.t.P^ 32, log Z P a Sl .... 

3 1*" * 3 r 

P ' P 

where R - -£ p^^ log p, , . In particular, 

QIO) - ± ri log r| - log r: - JkI 

3(1) = R ♦ pj- r log <r-l): 

*» R + log (r-l')l 

This checks the evident answer for 3(1) - the f: 
symbol has equivocation R and the parts of the key not us* 
add log (r-lJI 

23. The Equivocation Characteristic for a "Ran dom" Closec 
Cipher > [ 


In the preceding section we have calculated the 
equivocation characteristic for a simple substitution appi 
to an independent letter language- This is about the simj 
type of cipher and the simplest language structure possibl 
yet already the formulas are so involved as to be nearly 
useless. What are we to do with cases of practical intere 
^ . say the involved transformations of a fractional transpose 
tion system applied to English with its extremely complex 
statistical structure? This complexity- itself suggests tfc 
method of approach* Sufficiently complicated problems can 
frequently be solved statistically, \ In order to do this y 
define the notion of a "random" cipher.. ^ 


We suppose that the possible messages of length 
can be divided into two groups, one group of high and fair 
uniform probability, while the total probability in the 
second group is small. This is usually possible in inform 
tion theory if the messages have any reasonable length. I 
the total number of messages be 

H » 2 

where R is the maximum rate and N the number of letters-, 
high probability group will contain about 


3 = 2 

where R is the statistical rate. 

The deciphering operation defin&s a function M~ i 
which can be thought of as a series of lines, k for each E 
going back to various M' s. By a random cipher we will mear 
one in which all keys are equally likely and the k lines 
from any E go back to random M»s.. The equivocation' in key 
is given by - - ' 1 " 

Q(K) - 2 P(E) P E (K) log P E (K) 

The probability of exactly m lines going back 
to the high probability group is 

- 75 - ^nil HUB P 

(k) (s) m n s) k ' m 
(m) (IT) 11 " I) 

If a cryptogram with m lines going to high probability mes- 
sages is intercepted, the equivocation is log m. The prob: 
ity of intercepting such a cryptogram is easily seen to be 
Sic ' 

Hence the mean equivocation is 

■ * ■ & A ui ill* (1 -§ ,k " m ■ l0 s »' 

We wish to find an approximation^© this for large k. If t 

expected value of m, namely m * § k is »1, the variation c 
log m over the range where the binomial distribution assume 
large values will be small and we oar* replace log nf by log 
This then comes out of the summation leaving the expected e 
Hence in this condition 

Q - log | k 

- log S - log H + log k 

- Ik! - ImJ + 1m I 

- IkI - N D. 

If m is small compared to the large k, the binomial distri- 
bution can be approximated by a Poisson distribution.* 

(k) m k-m e" X X m \ m S * 
lm) ^ H ml a 


Q - £ e S £r m log m 

•* 2 


-X co * m. 
- e £ ~r lo€ (»♦!)' 

*Fry, Probability and Its Engineering Uses, p. 214, 

- 76 - 

When we write (m ♦ 1) for m. This used in the regi< 
where X is near unity. For X « 1 the only important term 
the series is m - 1; omitting the others 


<} « e \ log S 

» X log 2 

- 2 lKl Z' m log 2 

Thus <i IK) starts off at IkI , and decreases line 
with slope -D out to the neighborhood of N»lKl/D. After a 
short transition region, Q, follows an exponential witn ha 
life" distance l/D if D is in alternatives per letter. If 
is in digits per letter l/D is the distance for a decrease 
by a factor of 10. The benavior is shown in Fig, 14 with 
the approximating, curves. 

By a similar argument given in the appendix, the 
equivocation of message can be calculated. It is 

Q(M) - lid 1 * B Q N for B N« Q(K)*1kI-DN 

CUM) - Q,(K) B Q N» <4(K) 

Q,(M) - %{K\ - 9 (N) B.(N) " Q,(K) 

where <p(N) is the function of Fig. 14, with N scale reduce 
by a factor of D . Q(M) rises linearly with slope B unt 

R o 

this line interests the q(K) line. After a rounded transl 
it follows Q(K) down. 

Most ciphers have an equivocation characteristic 
of this general type, approaching zero rather sharply. We 
wiU call the number of letters required for near unicity 
solution the unicity distance, 

24,. Application to Standard Ciphers . 

The characteristic derived for the random cipher 
may be expected to apply approximately in many cases, pro- 
viaine some precautions are taken and certain corrections 
are mfde. ThTmain points to be observed are the f ollowin 

1. We assumed in deriving the random characteristic 
that the possible decipherments of a cryptogram 
are a random selection from the possible message 
This is not true in- actual oases, but becomes mc 
nearly true as the complexity of the operations 
used in the enciphering process and the complex! 
of the language structure increase. The more cc 
' plicated the type pf cipher, the more it should 
follow the random characteristic. In the case c 

- 77 - 

a transposition cipher it is clear that letter 
frequencies are preserved. This means that the 
possible decipherments are chosen from a more 
limited group - not the entire message space - 
and the formula should be changed. In place of 
R one uses Ri the rate for independent letters 
but with the regular frequencies. This changes 
the redundancy from 

D - r q - r * .707 digits/letter 

D f " Rjl - R * •538 digits/letter 

and the equivocation reduoes more slowly. In 
some other cases a definite tendency toward re- 
turning the decipherments to high probability 
messages can be seen. If there is no clear 
tendency of this sort, and the system is fairly 
complicated, and the language a- natural one 
. (with its very complex statistical structure) - 
then it Is reasonable to make the random cipher 

In many cases the key does not all appear as 
soon as It might. For example in simple sub- 
stitution one must wait for a long time to find 
all letters of the alphabet represented in the 
message and thus deduce the complete key. The 
message becomes unique long before this point. 
Obviously our random assumption falls down in 
such a case, since all the different keys which 
differ only in the letters not yet appearing 
lead back to the same message, and are not ran- 
domly distributed. This error is easily cor- 
rected by the use of the key appearanoe character 
Istio. One uses at a particular N, the amount 
of key that may be expected at that point in the 
formula for , 

There are certain "end effects* 1 due to the defini 
starting of the message which produce a discrepar 
from the random characteristics. If we take a 
random starting point in English text the first 
letter (when .we do not observe the preceding 
lsttars) hasa possibility of being any letter w: 


- 78 - 

the ordinary letter probabilities. The next 
letter is more completely specified since we 
then have digram frequencies. This decrease 
in choice value continues for some time. The 
effect of this on the curve is that the straigh 
line part is displaced, and approached by a 
curve depending on how much the statistical 
structure of the language is spread out over 
adjacent letters. As a first approximation 
the curve can be corrected by shifting the line 
• over to the half redundancy point - i.e., the 
number of letters where the language redundancy 
is half its final value* 

If account is taken of these three effects, rea 
sonable estimates of the equivocation characteristic and 
unicity point can be made. The calculation can be done 
graphically as indicated in Figs. 15 and 16. One draws t. 
key appearance characteristic TKl - ^A^-) *&• total r 
dundanoy curve ImJ -ImI {which fa usually sufficiently 
well represented by the line' NR) ♦ The difference between 
these out to the neighborhood of their intersection is 
For the simple substitution the characteristic is shown 
in Fig. 17. In so far as experimental checks could be ca. 
ried out they fit this curve very well. For example, the 
unicity point, at about 27 letters, oan be shown experi- 
mentally to lie between the limits 22 and 30. With 30 le 
one nearly always has a unique solution to a cryptogram o: 
this type and with 22 it is usually easy to find a number 

With transposition of period d, the unicity poi. 
occurs at about 1.5 d log d/c. This also checks fairly w 
experimentally* Note that in this case Q, is defined on. 
for integral multiples of d. ' 

With the Vigenere the unicity point will occur t 
about 2d + 2 letters, and this too is about right. The 
Vigenere characteristic with the same key size as simple i 
stitution will be approximately as shown in Fig. 3.8, The 
Vigenere, £layf air and Fractibnal cases are more likely tc 
follow the theoretical formulas for random ciphers than 
simple substitution and transposition,. The reason for th: 
is that they are more complex and give better .mixing char- 
acteristics to the messages on which they operate* 

■-- ■ ' i ' 

The mixed alphabet Vigenere (each of d alphabet 
mixed independently and used sequentially) has a key size. 

'4i- .. 

1 . 














- 79 - 

IkI - d log 26V- 26.3 d 

and its unicity point should be at about 53 d ♦ 2 letters 

These conclusions can also be put to a rough ex 
perimental test with the Caesar type cipher. In the part 
cular cryptogram analyzed in Table I, section 19, the fun 
tion QlN) has been calculated and is given below, togethe 
•with the values for a random cipher. 

N . ♦ 

Q {observed) 1.41 
Q (calculated) 1.41 

The agreement is seen to be quite good, especia 
when we remember that the observed 9, should actually be t 
average of many different cryptograms, and that D for the 
larger values of ,M is only roughly estimated. * 

It appears then that the random cipher analysis 
can be used to estimate equivocation characteristics and 
the unicity distance for the ordinary types of ciphers. 

25. Solving Systems Using Only N-Gram Structure . , 

The preceding analysis can also be applied to c 
where the cryptanalyst is assumed to know or use only a 
limited knowledge of the structure of the language. If n 
data about the language other than the digram frequencies 
is used in solving cryptograms the equivocation curves ma: 
be computed, using for the redundancy curve that obtained 
from D„ alone. This curve lies below the curve for all r< 
dundancy and the unicity point will therefore be moved to 
a larger N. Fig, 19 shows the Q curves for simple substi- 
tution on normal English when the cryptanalyst uses only 
digram structures.- 

26 * . Validity of a Cryptogram Solution . 

■ * • 

The equivocation formulas are relevant to quest: 
which sometimes arise in cryptographio work regarding the 
validity of an alleged solution to a cryptogram.. In the 
history of cryptography one finds many cryptograms, or 
possible cryptograms/ where clever analysts have found a 
^solution*!* It involved,* however, sucty a complex process 
the material was 'so scanty, that the question arose as to 

- 80 

whether the cryptanalyst had "read a solution" into the 
cryptogram. See for example the Bacon-Shakespeare ciphers 
and the "Roger Bacon" manuscript.* 

In general we may say that if a proposed system 
and key solves a system for a length of material considers 
greater than the unicity distance the solution is trust- 
worthy. If the material is of the same order or shorter 
; _ than the unicity distance the solution is highly suspicioi 

Thifleffeot of redundancy in gradually producing 
unique solution to a cipher can be thought of in another \ 
which is helpful. The redundancy is essentially a series 
conditions on the letters of the message, which insure tte 
it be statistically reasonable. These consistency conditi 
produce corresponding consistency conditions in the crypto 
gram. The key gives a certain amount of freedom to the 
cryptogram, but as more and more letters are intercepted, 
the consistency conditions use up the freedom allowed by t 
key. Eventually there is only one message and key which 
satisfy all the conditions and we have a unique solution. 
In the random cipher the consistency conditions are in a 
sense "orthogonal" to the "grain of the key", and have the 
full effect in eliminating messages and keys as rapidly at 
possible. This is the usual case. However, by proper de- 
sign it is possible to "line up" the redundancy of the 
language with the "grain of the key" in such a way that tt. 
consistency conditions are automatically satisfied and Q, 
does not approach zero. These "ideal" systems are of such 
a nature that the transformations T. all induce the same 
probabilities in the E space. Ideal characteristics are 
shown in Fig. 20. 

27. Ideal Secrecy Systems . 

We have seen that *perf ect secrecy requires an 
infinite amount of key* With a finite key size, the equiv 
cation of key and message generally approach zero, but not 
necessarily so* In fact It is possible for Q(K) to remain 
constant at its Initial, value IX). Then, ho matter how 
much material . is intercepted, there is not a unique soluti 
but many of comparable, probability. We will define an 
"ideal" system as one in which (UK) and Q(M) do not approa 
zero as-* oo, A "strongly ideal" system is one in which 
Q(K) .remains constant at IKU 

*See Fletcher Pratt, "Secret and Urgent" 

m 81 - CO] 

r ."V 5,- 


.1 1 * 


An example is a simple substitution on an artifi 
language in which all letter probabilities are the same and 
each letter independently chosen. It is clear that Q(K) » 
and Q(M) rises linearly along a line of slope Rq until it 
strikes the line Q(K), after which it remains constant at 
this value. 

With natural languages it is in general possible 
to approximate the ideal characteristic - the unicity point 
can be made to occur for as large N as is desired. The 
complexity of the system needed usually goes up rapidly as 
we attempt to do this, however*. It is not always possible 
to actually attain the ideal characteristic with any. system 
of finite complexity*. 

To approximate the ideal equivocation, one may 
first operate on the message with a transducer which reduce: 
to the normal form « i.e., with all redundancies removed. 
After this almost any simple ciphering system - substitutio: 
transposition, Vigenere etc*, id satisfactory* The more 
elaborate the transducer and the nearer the output is to 
normal form, the more closely will the secrecy system ap- 
proximate the ideal characteristic. Theorem 20: A necessa: 
and sufficient condition that T be strongly ideal is that 
for any two keys T T -1 T - is a moasure preserving trans- 

1 J 

formation of fi^ into itself* ' 

This is true since the a posteriori probability 
of each key is equal to its a priori probability if and onl; 
if this condition is satisfied, 

28* Examples of Ideal Socrecy Systems . 

Suppose our language consists of n sequence of 
letters all chosen independently and with oqual probability 
Then the redundancy is zero, |M: o l ■ |M"j , and from Theorem 11 

Q(K) - |K|. We obtain the result 

Theorem 21? If all letters aro equally likely and independc 
any closed oipher is strongly ideal* 

The equivocation of message will rise along the 
key appearance characteristic |K| - which will usuall: 

approach |k|, although in some casos it does' not*. In the 
cases of N-gram substitution,, transposition', Vigenere and 
variations, fractional, otc, wo havo strongly ideal system; 
for this simple language with Q(M) — |K| as oo.. 

- 82 - 

If the letters are independent but are not all 
equally probable, the transposition cipher characteristics 
remain essentially the same. The asymptotic equivocations 
of both key and message are clearly IKl. In the substitution 
cipher they will be less. If all the letter probabilities are 
different, then the asymptotic equivocations of both key and 
message are zero. The letters can all eventually be de- 
termined by frequency count (apart from certain exceptional 
sequences of zero measure)* Suppose now that there are ? 

letters with probabilities, ' , . 

... . , 

P X - P 2 < P 3 < P 4 - P 5 - P 6 < P 9 

In this case we cannot separate p, from pg or p 4 p= and p fi 
from each other, but the different unequal probability groups 
can be eventually separated. 

If all substitutions are a priori equally likely, 
there will be an asymptotic uncertainty among 

■ ■• 

2i x 3I 

equally likely (a posteriori ) keys. Hence, the symptotic Q, 

■ log 21 3: 

In general it is clear that the asymptotic equivocation with 
a substitution where the different substitutions are equally 
likely is 

$ m (M) ■ (K) - log H 

vhere H Is the order of the group of substitutions on the 
letter probabilities p^ ... p fl which leave this set invariant. 

More generally we can consider an arbitrary pure 
sy stem T and a pure language L, . Suppose that T operates > 
only "locally" on the letters of U in the sense that the nth 
letter of cryptogram depends only on n and a certain finite 
number of the letters of M in the neighborhood of the nth 
one: ■ ■ - ' itU - -"*»-" 

e a - f lK.njm^ m^,. . t.m^p)'. 


Then we can show that there is a certain subgroup of the t 
formations T ^-1 T which are probability preserving in the 

language L. In the limiting cases these would consist of 
the identity or of the whole group ™ -1™ 

T i V 

Theorem B2: Under these conditions the asymptotic equivoc 
of key is the logarithm of the order of this subgroup of 
. measure preserving transformations. 

An ideal secTecy system suffers from a number 01 
- i '■ '.. " '*. . ** \ .. 

*•• 1* The system must be' closely matched to the langue 
This requires an extensive study of the structur 
of the language by the designer. Also a change 
statistical structure or a selection from the se 
of possible messages as in the case of probable 
words (words expected in this particular cryptog 
renders the system vulnerable to analysis. 

2. The structure of natural languages is extremely 
complicated, and this reflects in a complexity c 
the transformations required to reduce them to 
the normal form. Tbus any machine to perform th 
operation must necessarily be quite involved, at 
least in the direction of information storage, 
since a "dictionary" of magnitude greater than 

• that of an ordinary dictionary is to be expected 

3. In general, reduction of a natural language to a 
normal "form introduces a bad propagation of erro. 
characteristic. Error in transmission of a sing 
letter produces a region of changes near it of 
size comparable to the length of statistical 
effects in the original language,. 

£9* Multiple Substitute Ideal Systems. 

. * There is another way of obtaining ideal or nearl; 
,, ideal characteristics using multi-valued secrecy systems. 
Suppose our language contains only three letters with - 
probabilities 1/8, 3/8 and 4/8, and that successive letter: 

84 - 


in a message are chosen independently. Let there be 1 sub- 
stitute for the first letter, 3 for the second and 4 for 
the third, and choose at random among the possible substi- 
tutes for a letter. It is clear that this system is ideal, 
If the different probabilities are incommeasurabl'e, we canr 
exactly achieve the ideal behavior, but can approximate it, 
by using enough substitutes, as closely as desired* 

If the language is more complex, with transition 
probabilities, this general method can still be used, but i 
becomes more involved* Suppose the choice of a letter de- 
pends only on the two preceding letters, not on any more 
remote part of the message. The transition probabilities 
p, (k) completely desoribe the statistical structure of the 

language. We supply substitutes for k When it follows i, J 
proportion to p^ 1*1* Of all our m substitutes mp^tk) 

represent k after the pair i r J, As before one chooses from 
the possible substitutes for a letter at random. The crypt 
gram will then be a random sequenoe of the m substitute 

As an example, suppose the p^j) are the only 
statistics of the language and the values are given by 

iNJ 12 3 


.1 .3 ,6 
1 2 .5 ,3 
,9 .1 

With 10 substitutes 0, 1, 2, ,,,,9 we construct a substitu 
table assigning substitutes (chosen randomly) in proportion 
to the frequencies* The following is a typical key. 




7 0,5 # 6 1,2,3,4,8,9 

3,9 0,4,8 

j .\ • » • * 
0,1,2,3,5,6,7,8,9 4 

If a 3 follows a E in the message we substitute one of 0, 
for it, the choice being random. A second table must be s< 
plied for the first letter of the message, corresponding t 
unconditional probabilities of the three letters, • 

Although of theoretical interest it is doubtful 
whether such systems would be of much use practically beca- 
. of their complexity and message expansion in ordinary case 
However j, the first approximation to such systems, matching 
letter frequencies, has b$en used in ciphers and is standa; 
practice in codes (where one matches word frequencies). 

30 . Equivocation Rate." 

■ ■ .< We now return briefly to cases where the key is 

not finite, but is supplied constantly, as in the Vernam s- 
and the running key cipher In such cases we may define 
equivocation "rates'*. One ©onsldere the equivocation Q(N) 
of the message when N letters have been intercepted, The 
equivocation rate for the message Is defined as the limit 
(assuming it exists): 

Lim" Q(N) 

N-oo ~ Q • 

The rate for equivocation of key would be defined similarl; 
using the equivocation in the part of the key that has beei 
used only, but of course these two are the same. There art 
results for these parameters analagous to those obtained 
with finite key cases. Let R» be the mean rate of using 


Theorem 23: 

... * '■• 

Q* < R» 

In case the equality holds we have the analogue of ideal 
systems where the complete information of the key goes intc 
equivocation. If R* > IB the rate of the-message source, 
we can obtain perfect secreoy - In fact we may define per- 
fect secrecy as the case in which Q* * H« , 

In the random pase we have the analogous result 

V - R» - D, • 

31, Further Remarks on^ Equivocation and^ Redundancy . 

We have taken the redundancy of "normal English" 
to be about ,7 digits per letter of 50^ of R Q . This is on 

the assumption that word divisions were omitted. It is at 
approximate figure based on statistical structure of the 
order of lengths of perhaps 8 letters, and assumes the te?. 
to be of an ordinary type, such as newspaper writing, 
literary work, etc. Various methods of calculating re<- 
dundancy have been devised and will be described in the 
memorandum on information mentioned in the intro- 
duction. We may note here two methods of roughly estimati 
this number which are of cryptographic interest. 

A running key cipher is a Vernam type system whe 
in place of a random sequence of letters the key is a 
meaningful text. Now it is known that running key ciphers 
can usually be solved uniquely. .This shows that English 
can be reduced by a factor of two to one and implies a 
redundancy of at least oOjfa. This figure cannot , be reduced 
very much, however, for a number of reasons, unless long 
range "meaning" structure of English .is considered* , . 

The running key cipher can be easily improved to 
lead to ciphering systems which could not be solved withou 
the key.. If one uses in place of one English text, about 
4 different texts as key, adding them all to the message, 
a sufficient amount of key has been introduced to produce 
a high positive equivocation rate. Another method would 
be to use say every 10th letter of the text as key. The 
intermediate letters are omitted and cannot be used at any 
other point of the message, This has the same effect, sine 
the mean rate for these spaced letters must be over .8 H o . 

These methods might be useful for spies or diplor 
. who could use books or magazines for the key source. 

A second way of showing the high redundancy of 
English is to delete all vowels from a passage. In. general 
it is possible to fill them in again uniquely and .recover 
the original, without knowing it in advance. ■ As the vowels 
constitute about 40j£ of the text this jmta a limit on the 
redundancy. ' Aotually there is considerable redundancy left 
the various letter and digram frequencies being far tram 
uniform, c '■• . ■ v v, f - ~--: x m-. 

■ - - . \ ■ ■•. -v • • "• • 

- - This suggests a simple,, way of greatly improving 

almost any simple ciphering: system * - Jirst delete all vowel 
or as much of the message ss possible without running the 
risk of multiple solutions, -and than encipher the residue. 
Since this reduces the redundancy by a factor of perhaps 
3 or 4 to 1, the unicity~ point will be moved out by this 


- 87 - CONK 

factor. This is one way of approaching ideal systems - 
using the decipherer's knowledge of English as part of the 
deciphering system, **** w WA 6Iie 

Two extremes of redundancy in English prose are 
represented by Basic English and Joyce's "Einnegans Wake", 
The basic English vocabulary consists of only 850 words 
and a rough estimate puts the redundancy at about 70*. 
A cipher applied to this sort of text would rapidly approa 
unicity. Joyce, on the other hand, would be relatively ea 

ifJSfi*??^??* ' fl ? aI1 red ^ancy is disclosed by the dif- 
ficulty in filling incorrectly even a single missing lett, 
pom "Jinnegan8 : Wake" f What the numerical value is, would 
be difficult to determine > it varies widely throughout the 


■ - : * . '"'<-./* 
The mathematical extremes of redundancy, and 1C 
can be constructed in artificial languages. .In the first 
we have e.g.. a single possible message. iden- 

tically and QIK) ih, the random cipher case declines as 
rapidly as possible i.e.., as rapidly as ohe sends informa- 
tion on the system,, v In .the other extreme all letter sequer 
are equally likely, and any closed ciphering system is idee 

We may refer here to a memorandum by Nyquist 
(Enciphering-Effect of Redundancy in "Language, May 30, 1944 
in which some questions of the type we are considering here 
are discussed. i *— 

32. Distribution of Equivocation . 

A more complete description of a secrecy system 
applied to a language than is afforded by the equivocation 
characteristics can be found by giving the distribution 
of equivocation. For N intercepted letters we consider 
the fraction of cryptograms for which Q (for these particu- 
lar E's, not the mean OJ lies between certain limits. This 
gives a density distribution function • 

. P(Q,Nh d^ 

f 01, ^^Probability that, for N letters Q lies between the 
limits Q and Q + dft, . The mean equivocation we have previous 
studied is the mean -of ^this distribution. .; 


The function P(Q,N), can- be thought of as plottedalong a 
third dimension, normal .to the paper, on the Q^N plane. If 
the language is pure, with a small influence « range (com- 
pared to K) and the cipher is pure the function P(Q,N) will 

88 - *P0!ff'lU.iJfIAL 

usually be a ridge in this plane whose highest point follows 
approximately the mean at least until near the unicLty 
point. • In this case, or when the conditions are nearly 
verified, the mean Q curve gives a reasonably complete pictv 
of the system, • 

On the other hand, if the language is not pure, 
but made up of a set of pure components.. 

L • Z %\ , 

■ ' ' ■ '• 

having different equivocation curves with the system, say 
Qi. Qj>, .... Q then the total Q distribution will usually be 
made up of a series of Ridges* 1 There will be one for each 1 
weighted in accordance with its p*y The mean, equivocation 
characteristic will be a line somSwhere in the midst of thes 
ridges and may not give a- very complete picture of the sit- 
uation. This is shown in Pig* '21 # ,« , ' ~ 

A similar effect occurs if the, system is not pure 
but made up of several systems with different ft curves. 
There is then a series of ridges in the PU,N) plot, and 
the mean Q, strikes an average which ,may lie between ridges 
and be a very improbable value of Q, for a particular crypto- 
gram. These effects are illustrated in Fig. -22. 

The effect of mixing pure languages which are 
near to one another in statistical structure is to increase 
the width of the ridge. Near the unicity point this tends 
to raise the mean equivocation, since equivocation cannot 
become negative and the spreading is chiefly in the positive 
direction. We expect therefore, that in this region the 
calculations based on the random cipher should be somewhat 


- 89 - 


, Practical Secrecy 

33. The v . T ork Characteristic 

After the unicity point has been passed there wil 
usually be a unique solution to the cryptogram. The proble 
of isolating this single solution of high probability is th- 
problem of cryptanalysis .. In the region before the unicity 
point we mav say that the problem of cryptanalysis is that 
isolating all the possible solutions of high probability (c 
pared to the remainder) and determining their various probe 
ities. . . i ... / ** -.'* " - . ... 

>.; :; ' 7 V-- - . 
Although it is always possible in. principle, to de- 
f. • mine these solutions <ty trial of each ^possible key for e'xa; 

different enciphering systems show a wide variation in the s 
of work required. The average amount of work to determine 
key for a cryptogram of N letters- T "(N) measured say in man . 
may be called the work characteristic of the system. This 
averag. is taken over all messages and all keys with their ; 
propriate probabilities. 

; , For a simple substitution on Snglish the work and 

equivocation characteristics would be somewhat as shown in 
Fig.. 23.- The dotted portion of the curve is where there ar 
numerous possible solutions and these must all be determine 
In the solid portion .after the unicity point only one solut. 
exists in general, but if only the minimum necessary data e 
given a gr^at deal of work must be done to isolate it. As 
more material is used thj work rapidly decreases toward som 
asymptotic value - where the additional data no longer redu-, 
the labor. , 

I , This is the work characteristic for the key. It : 

* \ '. clear that after the unicity point this function can never : 

• *■ 1 creese. There is also a work characteristic: fdr the messag 

the average emount of work to determine th;e;raessago (or all 
' reasonable messages) . . This will i, ih ordinary cases , be bel 
or et any rate not far above the work characteristic for th 
key, out to fairly large W. since generally If 'the key is d 
termined it is easy to find IS by the deciphering transformer 
For very largo N, howevdr, this function will incroa-se due 
merely to the lebor of deciphering the large amount of inte: 
cepted material. . - 

- 90 

Essentially the behavior s^ ^>*^ Mo , 
exnected with any type of seer -c y quired, however 

c.pproaches zero. The seal ^ofv men nou *^ g> _ ven ^ 
will differ greatly with diffor*nt ^yp Qr cocipound 

th . Q curves are about *gw. ^ k 5 y si2i3 would have a muc 
Vigenere, for example, with th. Sect/ristic. * good practic: 
better (U./nuoh ^f^f t tf"(H)curve remains sufficie: 
secrecy system is one one expects to transmit 

ly high out to the number of ™ uct S ai i y carrying out 

with the key, -to g^tv^t tStuch an extent that the inform: 
the solution,' or to delay it to su i 

tion is obsolete. * • • . 

-V ^•^ wi Uxan,ider>n the following ^^ S b /^ C ?L- 
. keeping the* Unction fW^o, - ^^^type of "problem as 
» cllv zero, * This is essential/ - h fttle of wits.*. ' In design- 
■ is always'the .case when we ^^g^ amount of work 
ing a good r cipher we must m ™ unougn merely to 

thf ene** r nust do ^ t^ ; k it.^ ^ ** f twullysis work - 
be sure none 01 tho St. nd.ra iU break the system 

we must show thct no method ^tev.r f Q$ m ny sy stems 

< easily. This U 5l! t b 3i SS known methods of solutio: 
they were designed to resist ai w fl;3tno d which applied to 
but had r structure leading to n ;*> n r™ hfcVd b3on many 

disclosed werknjssos of th„ir own. 

- -v flasiKii is essentially on 

in a field . • . 

v.- e „r« that a system which is not 

vife3* 1 -„-,- -"*""*." »tTh »nrv of Games"., The s: 

te^'^^^ Neumann ^^^^^Sr cnl crjptanalyst can be th 
,.tlori between the ci P ner -/t?nfi atructure; a zero-sum two p 
• - ' : ^ 'lt ss^gome" of » very feLT 'Lt ^ "novas*. The < 
^ game wi%. comp^^^ Information,^ ana jv. cryptan: 

I %. Cign#chooses a system for ^^^^-^^od-of analysis 

is informed of. this choic. and cno ~ rjqu ired to bre 
. - The "value" of the P^.J ^ " na thod cll0Sjn ...' 

r. cryptogram in the system cy 

•(1) *fe can study the possible methods of solution available 
to tha cryptanalyst and attempt to describe them in suffici^-n' 
gen:.rc.l t^rns to cover iny methods h^ might use. fc'j th^n con- 
struct our system to resist this "general" method of solution. 
(2) \U may construct our ciphers in such a way that breaking i 
is equivalent to (or requires at some point in the process) tl 
solution of some problem known to be Laborious. Thus, if we 
could show thf.t solving t system requires at least as much wor 
as solving a system of simultaneous equations in a largo numb^ 
of unknown, of a complex type, then we will have e lower bounc 
of sorts for the work characteristic. ' . 

"i-- r ■ •"' . •„•> ' 

The next three sections ore aimed at these general 
problems. It is difficult to define the pertinent ideas in- 
volved with sufficient precision to obtain results in the forrr. 
of mathematical theorems/ but it is believed that the conclusi 
in the form of general principles, are correct. 

34 . - Generalities on the Solution of Cryptograms . 

After the unicity distance has been exceeded in int c 
cepted materiel, any system can be solved in principle by mor_- 
trying each possible key until the unique solution is obtained 
i.e., a deciphered message which "makes sense" in ~l*-r. A simpl 
calculation shows that this method of solution (which we may c 
complete trial nnd error ) is totally impractical except when t 
key is absurdly smalTT 

Suppose, for example, we ht-vo a key of 261 possibili 
or about 26.3 digits, the samu size as in simple substitution 
English. This is, by any significant measure, a small key. I 
can be written on a sm?:ll slip of paper, or memorized in a few- 
minutes. It could be registered on 27 switches each having to; 
positions or on 68 two position switches'. 

Suppose further, to give the cryptanalystl every poss- 
ible* advantage, thtt he constructs a electronic device to try 
keys &t the rate, of one each microsecond ( perhaps ^eutomati call' 
selecting from the~rosults by a X 2 test for statistical signi-' 
fionnce). He nr:y expect to reach the right key about half way 
through, and after nn elapsed time of about ->> 

2 x 60 c x 24 X 365 x 10 

26~ • ' ' ' ->' 

— - r - 3 x X0 X * years 

<P w Ami. « TfiK ~ mo '/ 


In other words, even with a smtll key compl-te trial 
and error will nev^r be used in solving cryptograms, except in 
the trivial case where the key is extremely small, e.g., the 

caeser with only 26 possibilities, or 1.4 digits. The tri 
snd error which is used so commonly in cryptograph"; is of 
different sort, or is augmented by other means. If one he. 
secrecy system which required complete trial and error it 
be extremely safe.- Such a system would result, it appears 
the original messages, all say of .1000 letters, weru a ran 
selection of 2 RN from the set of all 2 RoN sequences of 1 
letters. If any of the simple ciphers w«rc applied to the 
it seems that little improvement over complete trial and «. 
would by possible. 

The methods actually- used often involve a great trirl and error, but in a different way- First, the tr 
;,.;V ' _ ' progress from more probable to less probable hypotheses, a. 
* second,, each trial disposes of a large group of keys,. not 

% ■ . single one. Thus the key space may be 'divided into say 10 
subsets, each containing about the srjne number of keys. B. 
. at most 10 trials on= determines which subset is the corrtsc 

one. This subset is then divided into several secondary s 
sets end the process repeated.. Y/lth the same key size 
(K • 261 - 2 x 10 2 °) we would expect about 26 x 5 or 130 t: 
as compared to 10 26 by complete trial and error. The poss: 
bility of choosing the most likely of th~ subsets first fo 
test would improve this result evefi more. If the division: 
were into two compartments (the b^st way) only 90 trials w. 
be required. Wiore; s compljt^ trie! and error requires tr: 
to the order of the number of k-ys, this subdividing trial 
and error requires only trials to th~ order of the key siz 
in r.lternetives. 

This remains true even when the different keys h 
different probabilities. The proper procedure then to min. 
the expected number of trials is to divide the key space ix 
subsets of equiprobr bility , Yftien the proper subset is det. 
t .. , " . mined, this is again subdivided into equi probability subset 
; . : If this process can bo continued the number of trials expec 

when each division is into two subsets will be 
* *- • . 

r-v-.-" h- ki • - •• y ' 

- ■-» • * v . ... _ . log 2 . ,■ . 

? y r ' *- -r*v . v jf jfcch test has S possible results and each of t 

fc v; corresponds to the key being in one of S equiprobabilitf ~su 

rr^-. .then ., ,. .... lT^T.?^f 

t&ft- ."■ • 1 | V i ■ ... . ' 

Vyr,. - • * • • • n - ILL ■ : • 7 ,; v.. - 

C- \;. ' - . ' log S 


trials will bo expected. The intuitive aifnif icunco of thes^ 
results should be noted. In %h4 two compartment tuSt with 
jquiprobibility, each test yields one altornr.tiVw of informa- 
tion to the key. If the subsets hcv^ very different prob- 
abilities as in testing t. single key in complete trial and er 
only i snail amount of information is obtained froa th~ test. 
This with 26: equiproble keys, a tost of on„ vields only 

261-1 lnrr 26t -1 . 1 . m 1 

or about 10 alternatives of information. Dividing into S 
equiprobability subsets m^ximiz^s the information obtained fr 
each trial at log S, and the expected nuriber of trials is the 
total information to be obtained, that is th~ key size, divid 
by this amount , 

The question here is similar to various coin weigh- 
ing problems th; t he Vo been circulated recently. A typical 
example is the following: It is known that one coin in 27 is 
counterfeit, and slightly lighter than the rest. A chemists 
balance is available r,nd the counterfeit coin is to be isolat 
by a series of weighings, '"hi t is thu lee st number of weigh- 
ings to do this? The correct answer is 3, obtained by first 
dividing the coins into three groups of 9 uach.. Two Of 
are compered on the b: Irnce. The three possible rjsults de- 
termine the set of 9 containing the counterfeit.. This s^t is 
then divided into 5 subsets of 3 and the process continu 
The set of coins corresponds to th^ set of keys, the counturf 
coin to the correct key, and the weighing procedure to & trial 
or test. 


This method of solution is feasible only if the key 
space can be divided into e small number of subsets, with s 
simple method of determining to which subset the correct key 
belongs.. Started in another way. It is possible to solve for 
the key bit by bit.. One does not need to assume a complete kt 
in order to apply a consistency test and determine if the as- 
sumption is justified - an assumption on a "part of the key 
(or as to whether the key is in some large section of the key 
space) can bo tested. 

This is one of the greatest weaknesses of most ciph 
ing systems. For example, in simple substitution, an assumpt. 
on e single letter can be checked against its frequency, vari 
of contact, doubles or reversals, etc.. In determining a sing- 
letter the key space is reduced by 1.4 digits from th. origin 

26. The same effect is seen in all th~ elementary typos of 
ciDhers. In the VigenJr^, th- assumption of tvvo or thre^ 
letters of the key is easily chock-d by deciphering at other 
points with this fragment and seeing whether clear emerges* 
The compound Vigene'ro is much butter from this point of view, 
if we assume a fairly large number of component periods, pro- 
ducing a repetition rate larger than will be intercepted. 
Her-j as many key letters ere used in enciphering each letter 
as there ere periods - although this is only a fraction of the 
entire keyi at JLeast e fair number of letters must be assumed 
before a consistency, check can be applied* 
. v ••. *•> 

Our first conclusion then, regarding practical small 
key cipher design, is that a considerable amount of key should 
be used' in enciphering each small element of the message. 

35. Statistical Uethods 

' i - ,. It is possible to solve many kinds of ciphers by 
statistical analysis. Consider again simple substitution. 
Tha first thing a cryptographer do^s with an intercepted 
cryptogram is to make a frequency count. If the cryptogram 
contains say 200 letters it is safe to assume that few, if 
any, letters are out of their frequency groups, this being 
a division into 4 sets of well defined frequency limits. The 
log of the number of keys within this limitation may be 
calculated as 

log 21 91 .9! 61 «= 14.28 

and the simple frequency count thus reduces the key uncertainty 
by 12 digits, a tremendous gain. 


In general, e statistical attack proceeds as follows. 
A certain statistic is measured on the intercepted cryptogram 
2. This statistic is such that for all r easonable K it assumes 
about the sane value, Sr, the value depending only on the par- 
ti culnr" key 25^ that wrs used. The value thus obtained serves 
to limit the possible keys» to those which would give values 
of S in the neighborhood of that observed. .A statistic whicb , 
does not depend on K or which varies as much with Mas with K 
is not' of velue in limiting" K» Thus in transposition ciphers , 
the frequency, count of letters gives no information about K - 
every K loaves tB^s* statistic the sane. Hence one can make 
no use of a frequency count in breaking transposition ciphers. 

Ilore precisely one can ascribe a " solving power " to 
c given statistic S» For valuu of S there will be a 
conditional equivocation of the key Qg(K), the equivocation 

when S has its particular value and that is all that is kn 
concerning the key. The weighted mean of these values 

£P(S) Qs(K) 


gives the mean equivocation of the key y hen S is known, F 
being the: c priori probability of the pcrticular value S. 
key size IK I less this aean equivocation measures the "sol- 
power" of S, 

; >vpr In a strongly ideal cipher all statistics of the 
togram are independent of the particular key used. This i: 
the. measure preserving property -of TiTiZ-Von the a space o 
Tj-lT k on the space mentioned abovS. -~ • 

There are good and poor, statist ic's, just as ther 
good and poor nethods of trial and. error. Indeed the tri:.; 
error testing of hypothesis Jj a type of statistic, i-nd wh. 
yiB said above regarding the .best types of trials holds ge: 
- "A good statistic for solving a system must have th~ follow" 

1. It -must bo simple to measure. 

2. It nust depend more on the key then on the nesse t 
if it is meant to solve for the key. The veriati c 
with K should not mask its vrriation with K. 

3. The values of the statistic that can be "resolved' 
in spite of. the "fuzziness" produced by variation 
in II should divide the key space into a number of 
subsets of comparable probability, with the static 
tic specifying the one in which the correct key 
lies. The statistic should give us sizable infor- 

. nation about the key,, not a tiny fraction of an 
- alternative. . • ' - -" 

-4* ...The gives nust be simple and usable 
." • . - : Thus the subsets In which t bo statistic locates th 
v^key rxust be of .*L simple nature in ths^key spuce. 

: '- *> r< _ ' : iv '.. *' n^-ifHfcv'' . - irf A . 

, Frequency count for simple substitution is an 
: ,«$$opi£ uof 't. very good statistics* _ ' ^ ^Vv^:-. 

. » .. _ ,^t. ... . .. . - 

Two methods (other tban >rocouris^'o : ^i%enl' systems 
suggest themselves for frustrating a statistic^ analysis. 
These we mcy cf 11 the methods of diffusion and confusion , 
the method of diffusion th^ statistical structure of R whic: 
leads to its redund: ncy is "dissip; ted" into long range st: 
- i.e., into statistic;! structure involving long coabinati 

- 96 - 


- of letters in the cryptogram. The effect here is that the 
must intercept a tremendous amount of material to tie down 
sturcture, since the structure is evident only in blocks o: 
small individual probability. Furthermore even when he har 
ficient material, the analytical work required is much gre? 
since the redundancy has been diffused over a large number 
individual statistics. An example of diffusion of statisti 
is operating on a message m - mi, m 2 , m 3 ..... with a "smoc 
ing" operation, e^g, > v , 


' v n " s m n + i mod 26 , ■ - - 

. - -V - • i-1 ' •-r ^K,-/V 

- , , * " f . w HurlfCf. ■*■•■ ••• • " "' • - * ■ 1 

adding s successive letters of the message to get a letter 
^One can show that the redundancy of the y sequence is the s 
as that of the m sequence, but the structure has been dissi 
Thus the letter frequencies in y will be more nearly equal 
« in m, the diagram frequencies also mor3 nQapiy f aqual etc, 

... - deed any reversible operation which produces -one letter out 

each letter in and does not have an infinite "memory" has a. 
output with the sams redundancy as the input. The statisti 
can never be eliminated without comwession, but they can t 
spread out* • 

..r .' The method of confusion is to make the relation t 

the simple statistics of 3 and the simple description of K 
complex and involvid one. In the case of simple substituti 
was easy to describe the limitation of K imposed by the let 
frequencies of 3. If the connection is very involved and c 
fused the enemy can still evaluute a statistic Si say which 
the key to a region of the key space. This limitation, how 
is to some complex region R in the soace - folded over many 
and he has a difficult time mr.king use of it, A second stc 
S2 limits K still further to Rg, hence it lies in the inter, 
region R1R2* but this does not help much because it is so d; 
cult to determine just what 'the intersection is." . 

i , 'v-v To be more precise lot us .suppose the It ey space he 

oertcin "natural coordinates* kl,k2, " . k- which he .wishes 
terminey. .He measure's c set of -'stati sties sijSg^^^s' anc 
ere sufficients to determine the k^. However, in the method 
confusion, th* equations connecting thes a sets of variables 
involved and complex. We have, : s^y, - : '•^• ; ' : ' r ' a ~-~ 

f n (k 1 ,k 2 ,,.;,k i> ).- s n , 

- 97 - 


and all the f. Involve all the k^. The cryptographer must 

solve this system simultaneously - a difficult job. In the 
simple "(not confused) cases the functions involve only a 
small number of the k. - or at least some of these do* One 
first solves the simpler equations, evaluating some of the 
ki and substitutes these in the more complicated equations. 

The conclusion here is that for a good ciphering 
system steps should be taken either to diffuse or confuse 
the redundancy (or both)- / / . 

V '> ■ " ■ - "AV. . 

36, The Probable Word Method, . - ' _ , . . 

One of -the most powerful tools for- breaking ciphers 
is the . use of prQbable words,. The probable words may-^.-J^.y 
words or phrases expected in the particular message flue, tq j"; 
its source, or they may merely be common words or syllables 
which occur in any text in the language, such r.s the; end, 
tion, thrt, etc.." v i 

In genera 1> the probable word method is^used as 
follows* Assuming a probable word to be at some point in 
the cleT, the key or r part of the key is determined* This 
is used to decipher other pp. rts of the cryptogram and provide 
r consistency test* If the other prr£s come out in clerr, 
the resumption is justified. 

There pre few of the classical type ciphers that 
use a sm^ll key and can resist long under a probable word 
analysis. Fr^m a considerr tion of this method v.e can frame 
a test of ciphers v.hich might be called the r e id test. It 
applies only to ciphers with a small key (less thr.n say 50 
digits), applied to natural languages, and not using the 
ideal method of gaining secrecy. The r C id test is this: 
Hoy. difficult is it to determine the key or a p^rt of the 
key knowing n sample of message rnd corresponding cryptogram? 
Any system in v.hich this is easy cannot be very resistant, 
for the cryptr.nrlyst can always make use of probable words,- 
combined with trial and error, Until a consistent solution 
is obtained- 

- - . ' v •' .'• ' ■ ■ . : " ri - 

The conditions. r>n the, size of, the k:y make the 
amount of trial end error small, and .the' -condition about" 
ideal systems is necessary, since these automatically give 
consistency checks- The exist enoe~ of . probable words and v."*;-.-. 
phrrses is implied, by the condition .of natural language a* . * 
Conversely, it seems reasonable that if the key is difficult* ? ' 
to obtain, knowing a text : ahd Its cryptogram, then the 
system should be strong. • .*"■■' ' 

- 98 - COlMflENTIAL 

Note that this requirement by itself is not con- 
tradictory to the requirements that enciphering and decipher- 
ing be simple processes. Using functional notation we have 
for enciphering 

and for deciphering 

E = f (K, I) 
M - g (K, E). 

Both of these may be simple operations on their arguments 
without the third equation 

. - K » h (M, E) • - - ■ - ' 

• . jg -. ■ ' , . .- 

being simple* \. ^ v''"" ; - 

^ • - . .3 ' :" : : ''5v 

V'e may also point out In investigating a new type 
of ciphering system one of the best methods^off attack is to 
consider hove the key could' be determined' if a sufficient 
mount of'M and E were given. - 

With a small key, the work required to solve a 
system, given a lerge emount of dr.ta, may be expected to be 
not more thrn a few orders of magnitude greater thpn the 
work required to obtain the key from a small amount of datr 
when both U end E nrc known. 

The same principle of confusion era be (nnd must be 
used here to crer-te difficulties for the cryptan r lyst. 
Given K-rn^mg ... m g end E - e, e g e Q the crypt rn^lyst 

enn set up equations for the different key elements k^ k g 

(nrmely the encipherings equations)* V; " 

f g (n^, m 2# •♦♦,m 8 J l£ i# ».* # k r >^ 

- 99 - ' mm lUiLUTiius — - 

All is known, we assume, except the k,. Erch of thr s j equa- 
tions should therefore be complex in the k., and involve 
ninny of then. Otherwise the enemy en solve tho sicple om 
and then the more complex ones by substitution. 

From the point of view of increasing confusion, it 
is desirr-ble to hive the- f^ involve several n^. t especially 

if these sre not adjacent and hence less correlated. This 
introduces the undesirable feature of error propagation., 
however, for then erch e, will generPlly affect several m, 
in deciphering, and an error will spread to rll these.. 

We conclude thet much of the key should be used Ir. 
an involved manner in obtaining any cryptogram letter from 
the message to keep the work characteristic high* Further r 
dependence on several uncorrected m. 4-s desirable,, if some 
propagation of error can be , tolerated* V/e are led by all 
three of the rrguments of these sections to consider "mixing 
transformations,." , 

37* Mixing Trensf ormo tions 

A notion that hr-s proven v^lu^ble in certain branc 
of probability theory is the concept of a "mixing transforms 
tion." Suppose we have a probability or measure space 0, ar. 
measure preserving transformation T of the space into itself 
i.e., a transformation such that the measure of a transform* 
region TR is equal to the measure of the„initial region R. 
The transformation is called mixing if for any function de- 
fined over the space , end any region R. 

n^o, J 'til) dP - J dP J f (P) dP. 
T°R R O ' 

This means that any initial region of the space R under suc- 
cessive applications of T is mixed into the entire, space & 
With uniform density* In general S^R becomes, a region con- 
sisting of a large number of thin i filaments spread through- 
out the region..' As n increases the filaments become finer 
and their density more nearly constant* v • v 

An example of a mixing transf ormation is shown in 
Fig. 21. Here measure is identified with Euclidean area. ' 
The spaoe is the 'triengle and tNp is the print \ units ■ «f 
distance ab^ve point P providing this does n*>t g^ outside 
the triangle* When the top of the triangle is renched a 
point is transferred first to the point directly beneath, 
and then over to the right en irrational fraction of the 
base width. If this carries the point beyond the right edge 

- 100 - 

the extra distance is mersured from the left edge. -Successive 
transforms of b square region ere shown in Fig. 21. For \ 
ve,ry lrrge the squar-. is turned into q uniform grating ot 
nearly parallel thin strips covering the triangle. 

A mixing transformation in this precise sense en 
occur only in a spaee with on infinite number of points, for 
in a finite point space the transf ormation must be periodic. 
Speaking loosely, however, we can think of a mixing trans- 
formation as one which distributes ?ny reasonably cohesive 
region in the space fairly uniformly over the entire space. 
If the first region could be described in simple terms, the 
second would require very complex ones* In the case of 
y ~ cryptographic interest, the original region is all of a cer- 

•.; tain simple statistical structure — after the mix the region 

.< ' .is distributed and the structure diffused and confused* 

. Go~d mixing transformations are often formed by re- 
k. & " peated products of two simple non-commutating operations*. 
. ' See for example the mixing of pastry dough discussed by Hopf.* 
The dgugh is first rolled out into a thin slab,, then folded 
over,- then' rolled, and then folded again, etc 

In a good mixing transformation of a space with 
natural coordinates X,, X 2 ,. . *. ., Xg the point X. is carried 
by the transformation into a point Xi, with 

Xj^ ■*■ f ^ (X^ , Xg , • » » , , Xg ) i " 1 , 2 , * • • ,S 

and the function* f, are complicated, involving all the 
variables in a •"sensitive" way. A small variation of any one, 
X 3 , say, changes all the XI considerably. If X„ passes throug 
its range of possible variation the point XI traces a long 
winding path around the space. 


Various methods of mixing applicable to statistical 
sequences of the type found in natural languages can be 
-devised. One whioh lo ;ks fairly good is to follow a prelim- 
inary transposition by a sequence of alternating substitutions 
. '. ' J end simple linear operations, adding adjaoen^ letters mod 26 

* for. example * • r ■. .. ; > 

Thus . >.-. '. 

S*Jht r-'i- • • . • • ■ *' . . . -f i SJ rv-. - • ' 

H - L3ISLT ■ ; . 

"where T is a transposition, X .is a linear operation* and S is 
" ' - a substitution. 

• .. . 

*E. Hopf, On Causr-lity,. Statistics and Probability, Journol ol 
. / Mrth* and Physios, V.13, pp. 51-102, 1934. 

< v 

i ■a 

- 101 - 

38. Ciphers of the Type 1\HS. 
1 1 

Suppose that H is r good mixing transformation * 
can be applied to sequences of letters and thst T. find S. 
any two simple families of t ran s formations , i.e., two J 
ciphers 4 which may be the same.. For concreteness we m^y 1 
of them as both simple substitutions.. 

It appears that the cipher THS.will be r very g: 
ciphering system from the standpoint- of its work chnrnctei 
In the first place it is clcr on reviewing our arguments 
statistical methods that no simple statistics will give ir 
tion about the key - any significant . statistics derived fr 
must be of e highly involved end very sensitive type - the 
dundpncy has been both diffused and- confused by the mixing 
. . A lso probable words led to e complex system of equations 

Ing all parts of the key {when the mix is -good), which mu 
.solved simultaneously,. The bad features of such a system 
v v •• - :* propagation of errors and complexity of operations, both c 
/ • V: which get worse ns the mixing of H gets better. 

It is interesting to note that if the cipher T i 
omitted the rempining system is similar to S nn 1 thus no 
stronger. The enemy merely "unmixes" the cryptogram by 
, plication of H~l and then solves.. If S is omitted the re- 
maining system is much stronger th*n T alone if the mix is 
but still not comparable to THS. 

The bnslc principle here of simple ciphers sepa 
by a mixing transformation can of course be extended. For 
example one could use 

'S, ' T k H i S j H 2 R l 

«$& . . * - -, • . ' . >•*.»'«•• 

•• >«- ' J Ith two ml xes and three simple ciphers., One can also sim 
by using the same ciphers, and even the same keys (inner 
product) ns well as the same fixing transformations* - This 
• ;*jr.. might well simplify the mechanization of such systems^ " 

••/, ■ The mixing transformation which separates the t\ 

> - N {or more) appearances of the key acts as a kind of . barrier 

/>. ti;; J** enemy — it is easy to oarry a* known element over this 
barrier but an unknown (the key) does not go easily, 

«... .... , B y supplying two sets of -unknowns, the key for £ 

the key for T, and separating them by the mixing transform' 
H we have "tangled" the unknowns together in r way thrt m«V 
solution very difficult, 

Although systems constructed on this principle 

wpuld be extremely safe they possess one grave disadvantage. 
If the mix is good then the propagation of errors is b^d. 

A transmission error of one letter v.ill affect several let- 
ters on deciphering* 


39. The C omi.o und V ige neVe 

In the compound Vigenere severcl keys of length d. 
<3gf ..* f dg are written under the message and added to it 

modulo 26 to obtain the cryptogram, The 'result is 8 Vigenere 
with key of special type,' -whose repetition is of period d „ the 
least oommon multiple of cU, <5„, d g . If we h'-'ve three 
keys of periods £, 3, 5 thl total period is 50 nod the total 
key size (2+3+5) x 1,41 - 14,1 digits. The situation is then 

M ' a l ^ ^ m 4 m 5 m 6 - 


H ~\ a 2 a l a E a l k Z 
K 2 - b x b 2 b 3 b x b 2 b 3 

K 3 - C l C 2 C 3 C 4 C 5 C l 

E *" e l e 2 e 3 e 4 e 5 e 6 

ith . 

e l * ^1 4 a l + b l + c l 

e 2 " m l * a 2 4 b l 4 c 2 

If we assume M nnd E known then, letting »= r m ( 
s V a. + b,. 0,-h, a, + b 3 ♦ c, - h 5 

' ' " ' ' ■ + *2 * °2 " h 2 Q l 4 b l 4 °2 • V . 

R l * b 3 * c 3 " h 3 ' R 2 * c 3 ,r W 

. . . Q 2 * b l 4 °4 " *4 a l + b 3 4 C 4 " b 9 

Q l + b 2 + C 5 * h 5 C 2 + b l + C 5 " h 10 

These equations are easily solved for the key, although not as 
easily as in the simple Vigenero or othor sinple ciphers. As 
the number of constituent periods increases the solution be- 
comes more involved and time consuming. In any case wo have 
a system of simultaneous equations each involving S of the 

total of B^dj^ unknowns. The unicity point will occur at abou 

2B letters and if soveral tines this amount of material is in- 
tercepted no groat difficulty, should be encountered in breakin 
the cipher, providing S is not mora than say 6" or 8. With the 
first 9 primes as periods we have a key size of 100 letters or 
about 141 digits, the unicity distance is about 200 letters an 
the key does not repeat for 223,092,870 letters. This systen, 
although much better than such methods as simple substitution, 
transposition and simple Vigenero with equivalent key size,' 
does not utilize the available key fully in making the cryptV 
analyst work for the solution. The equations only involve 3 
of the B key unknowns and those in a simple fashion* The 
equations easily oombine and reduce to eliminate unknowns. If 
a large amount of material is available, compared to the unicii 
distance, particular sets of equations can be combined to 
eliminate unknowns very easily. The system possesses the inpo: 
advantage, however, of not expanding errors. One incorrect 
letter of cryptogram produces one incorrect letter of decipher*, 


By relatively simple changes this system could be 
strengthened considerably. If tho equations for the key 
elements (with M and E known) could be made into higher degree 
equations rather than linear ones the difficulty of solution 
would increase tremendously. This could easily be done in 
a mechanical device by successive multiplications (Mod 26) 
of tho key letters according to some prearranged schome, 


40 » Incompatablllty of the Criteria for Good Systems 

Tho five criteria for good socrccy systems given in 
seot ion 12 appear to havo a certain inconpatability when ap- - 
plied to a natural language with its complicated statistical 
structure. With artificial languages having a simple statis- 
tical structure it is 'possible to satisfy all requirements 
♦simultaneously, by means of the ideal type ciphers. In natural 
languages It seems that a compromise must bo made and tho 
valuations balanced against one another with a view toward 
the particular application. 

If any one of the five criteria is '"roppec* , the 
other four crn be s?itisfied fr.irly well, r.s the following 
examples show. 

1. If we omit the first requirement (amount of secrec 
any simple cipher such os. simple substitution will 
In the extreme case of omitting this condition com- 
pletely, no cipher at fll is required end one send. 
. the clef.ri 

2. If the size of the key is not limited the Vernam 
system can be used. 

3. If complexity of operation is not limited., various 
'•extremely complicated types of enciphering process 

cen be used* The modified compound Vigenere descr 
above with. many different periods compounded is f e : 
satisfactory as an example here, although it falls 
down somewhat on the key size condition. Ideal syf 
"and enciphered codes are also frir examples althou t 
not too good from the propagation of error point o: 

4i If we omit the propagation of error condition syst 
- of the type THS would be very good, although sonew: 
complice tad. 

5. If, we allow lr.rge expansion of message, vr.rious sy.- 
are easily devised where the "correct" message is : 
with many "incorrect" ones (misinf ormrtlon) . The \ 
determines which of these is correct. 

• A rough argument for the incompatibility of the. : 

conditions may be given as follows. 

> ' ' 

■ ' '* : From condition 5, secrecy systems essentially a s 
Studied In this paper must be used; i.e., no great use of r. 
etci Perfect and ideal systems are excluded by condition c 
rg^0&aMJHr 3 and 4, respectively. The high secrecy required- bj 
>'^;"^^^flWi«'*th«n*TD<3tf» -£rm a high work characteristic, not from a 
^ high equivocation. characteristic , If the key is small, the 
> '_' ^..^f^-r^: system' simple, and the errors do not propagate^ probable wc 
methods w 11}. generally solve the system fairly easily, sine 
we then have a' fairly simple .-system of equations for the ke 

This" reasoning is too vague to be conclusive, but 
general idea seems quite reasonable. Perhaps if the varioi. 
criteria could be given quantitative significance, some sot 
an exchange equation could be found involving them and giv: 
the best physically compatible sets of values. The two mo: 
- t difficult to measure numerically are the complexity of opei 
tions, end the complexity of statistical structure of the 
• language . , 


Appendix 1 

Deduction of - I pj log p i 

It will be shown that the meusure of choice - 
£ Pi. log Pi is a logical consequence of three quite reasone 
assumptions about the desired properties of such a measure. 
The three assumptions are: 

V (1) There exists a function C(p lt p 2 , p n ) 
uous in the p^, measuring the amount of "choice" when there 
n possibilities with probabilities p^ , 

/•-. ' • .. ' . ' • 

. <2) , C has the property that If a given choice be 
broken aown into two successive choices the. total amount of 
choice, is the weighted sum of the individual choices* . For 
example, suppose the choice is from 4 possibilities A, B, C 
with probabilities Yl, .2, «4U . .This can be broken down 
a preliminary choice hetween.the pair A, B and the pair C, 
Pair A, B has a total probability .1 + .2 « .3 and pair c, 
probability .3 + .4 « .7. If pair A, B is chosen a second 
between A and B must be made with probabilities -* 1 « 1 

.1 + .2 Z 

4 2 2 

V " If P air c » D is chosen a second choice betwee 

•* * 

and D must be made with probabilities ^ and * , Thus brok 
down we have a preliminary amount of choice C (.3, ,7) end 
of the time a secondary choice of c (± f 2 j while .7 of th 

time the secondary choice is C (2 . Our condition req 

that the total choice C (.1, .2, -3, t 4) be the same as the 
, weighted sum of the different choices when decomposed, weig 
in accordance with the frequency of occurrence. Thus we re 
in this case C ,2, .3, .4) « C (.3, .7) + ,3.C (- , - ) 

;f^^!-, If .A(n) ? c (I # . i,.!*.*. .» the choice 

when there are n equally likely possibilities, then A (n) i; 
monotdnio Increasing in n. i . 

Theoreaj . Under these three assumptions 

(•■••» - - • _ 

C (PI, P2, , Pn). 88 - K£ Pi log pi . 

where K is a positive constant. 

- 106 - 

From condition (2) we can decompose a choice from equall; 
likely possibilities into a series of m choices each from s 
equally likely possibilities and obtain 

A (S 111 ) ■ m A(s) 


;. (tn) - n A (t) 

We can choose n arbitrarily large and find an m to satisfy 

S*< t*< S 01 ■* 1 
Thus, taking logarithms and dividing by n log S, 

5 £ < log t V _m + ± 

'"log s- . , « j st lSTs.|- < e 
where* is arbitrarily small* 
Now from the monotonic property of A(n) 
A(SP) < A(tn) < AO* + 1) 

m a(s) < nA(t) < (m + 1) A(S) 
Hence, dividing by nA(S), 

m s t ) m 1 
n — MS) — n b 

• - m \ k " 

- I < 2 e A{t) • -K log t 

"{BY log S I *~ 

where K must be positive to setisfy (3), 

Now suppose we have a choice from n possibilities with comme 
surable probabilities p^ * where the are integers* 

can break down a choice f rom £n4 possibilities into a choice 
f roa possibilities Tvith probabilities pi* »>p n and then,, if 
the ith was chosen,, a choice from ni with equal probabilitie 
Using condition 2 again, we f equate the total choice from £ni 
as computed by two methods 

K log Eni - c (pi-, , P n ) + K£ Pi log nj_ 

- 107 - 


C - K [E pi log I ni " E pi log ni] 
■ * K 2 pi log -SL « -K £ Pi log pi 

If the pi are incommeasureble, they-may be approximated by 
rationale and the same expression must hold by our continuity, 

mce and amounts to the 

choice of a unit of meesure, 




- 108 - srfsrr 

Appendix 2 

proof of Theorem 4 

Select any message Mi and group together all crypto- 
grams that can be obtained from Mi by an enciphering operation 
Ti# Let this class of cryptograms be c{. Group with Mi all 
Mg that can be obtained from Mi by Tj^TjMlf and call this class 
Ox* The same ci would" be obtained if we started with any other 
M in Ci since : " ; .\. •' 

• - - : ; ■ I i . if, & TsTj^ki % : %iUmm.. ' . ■ 
.2.,: , ; • . •;. ^^aj^; 1 ^-" 

Similarly the same Ci would be obtained; :>r > 

- * 

Choosing &n M*.flf any exist) not , in Ci.we construct i- 
G2 and Ce in the same way* .'Thus ^We obtain the residue* classy 
with properties (1) and (2). Let Mi and M 2 be in Ci and suppose 

M 2 - T 2 Ti -1 Mi 


If El is in Ci and oen be obtained from Mi by 

Ei - \ U x -T p M x - M lr 


E l * ^ T 2 T l M 2 " T p T 2 X T l M 2 " ♦ m ' 


" ^ M 2 - ^ «2 

Thus each Mi in Ci transforms into Ei by the same number of keys. 
Similarly each Ei in c{ is obtained from any M in Ci by the same 
number of keys. It follows that this .number of keys is a divisor 
k ' , . of the total number of key* and hence we have properties' (3) and . .. 

.. * ^- o< * 

. . - •••• • I... 

... ,* S6*r* . 4.:? * 

" ; 1* •. 

. i ' .— .4 „• 

109 - 


x 3 

Equivocation of Message for Random Cipher 

As before let Mi ... M s be high probability mes 
and Ms+l ••«» Mu have zero probability. Let P(mi, m) be 
probability of just mi lines going from a particular E, s 
to a particular high probability M, say Mi, with a total 
lines to all high probability M. Then 


.-..!-■ f t 

_,„ (k) (m) (i)»l ( s ; i)"i-i» 1(1 . s) 

The probability of intercepting an E with m lines t 
bility M's la : ^ > 


' ■ - 

The Q(M) expected can be thought of as contributed to by 
various Mi .in the high probability group. Thus Ml contri 

. mi mi , m 
- log — = ■ —i log — 
m xue m m 6 mi 

if there are mi lines to Mi and a total of m to high pro^ 
M's. The expected Q is then 

(MM) - a S miEm PCj.m) §j SL log S_ 

The factor H sums over the various Ei and the S sums ovei 
different Ml,(i, l> t s ) • Hence, 

Q(M) - I £ P(mi,m) mi [ log m * log mj 

the term y 

i - v.- ■ ,. ■ 


E P (mi,m) m x 

summed on mi* gives the expected mi, when m lines^go to h 

probability. Mg t 1*©,, m/a, Henoath'e first term is 

• •* * •»:.-> fx*. ■*'■'; 

JL £ m P (m) log m * Q(K) 

by our previous work. The second term is 

• JSP (mj., m) mi log mi 

If the expected mi is «1 this term is small since it vanishes 
for mi ■ or 1. The expected mi is k/H» Thus beyond this 
point Q,(M) approaches closely to Q,(K) • The point in question 
is where JK| • |Mpf - RqN • 



If the expected »1 the log mi can be taken out as log Hi «* 
log k/Hi and we have' , - : 

log =y £ P>j 

' ' ^ -log § - }Mo1 r .|K! : ^-r • 

In' this "region then • - V " '. ' ; " y 

Q(1C) • |M | - id + d(K) 

but here Q(K) - ]k| - |M | + : • Jill, and therefore 
q(M) - |m[ - RN . - ' 

In the transition region Ei is about 1 and Iff will in 
ordinary cases be very large. It is admissable then to replace 
?(mi; m) by P(mi) , since this will not depend on m to any extent 
except for values of m of very small probability. Thus we obtain 
for this region 

iiU) - - 3 £ p(mi) mi log 

The "sum has the same "form as our expression for Q{K) but with 
l/H In place of s/H» The calculations for Q(K) can be used, 
therefore % with only a change of '< the^U scale byja factor of 

. '•' ' '"• ^>-"~" ^"'ft *" •' ' i. ' J}'*' 

- Ill - 

. .,"■■» 

v- ■ 

Appendix 4 

Key Appearance in Simple substitution with Independent Le- 

If successive letters are chosen independently e 
the different ' letters have probabilities Pi P2 Ps» we 
calculate the expected number of different letters when N 
letters have been intercepted. ; It is,. 

:,^,L, ,i IW - s - e (l - Pi) N ; 


To prove thi*« * iiaklte«iri^'*^Klbl« sequences of N le 
written down, each wifch'^a frequency corresponding to its ] 
bility, giving a total ^of aay A sequences*.. Letter 1 does 
appear in (1 * Pi) N A of thesej letter E does not appear i 
(1 - P2) N A etc. Therefore/ "the total number of letters r 
from sequences is 

AMI" Pi) N 

Dividing by A gives us by definition the expected number t 
missing letters from a random sequence, E(l - p«)N, rphe j 
of different letters expected in a sequence is the total : 
of letters S minus this, giving the desired result. 

If all the pj. are equal this reduces to S - S(l 
ah exponential approach to S« In the general case there i 
series of exponentials with different time constants, cor: 
sponding to different p^, which are added to give «L(N). 

With the frequencies of normal English used for 
p^ t we' obtain the curve shown in Fig* 25, along with ah e: 
mental ourve. The small discrepancy can be attributed to 
influences of nearby letters* ( IaJBnglish- there is less tc 
-to double letters than there would be if the letters were 
pendent but" with' the same probabilities. For English the 
.bility of a doubled diagram is , ^ 

i*K.'«Mu • . ••' •- • ■ -k. J: .. * h'S , " 

r^y 'i'^i*^^- *->.. \v. £ P(i* i) " • 0315 

. * while if letters were independent it would be v 

.-. ^ - » -,:■■■:*■;{ p ■ ; ■ - * *. • •> • ' - -• U. 

E pj * ,0670. 

.appendix 5 

A Theoretical Case Where All Invariant Statistics of E Are 
Independent of K . 

By an invariant statistic of e sequence of letters 
S »',».., m_2 niQ m^ m 2 • m 3 , we will mean r statistic 
which is averaged along the length of the sequence E» More 
precisely a statistic of the form:, 

Lim i — (F(E_b)*-»- ♦+ F(E„i)+r{E) ♦ F(Et) + F (E2J+...+ F(E n ) 

n -co (2n+l) ( ^ — 

.... , . ■ ' . 4 * ".' ■ ■ ... . • ■ -Vi?, : 

' '■■ .' . , * , ... " ' • ,. . " . - _ •• 

where F is any function whose argument Is a possible sequence , and 
E±a is the sequence E shifted N letters to the right -or loft. 
Such statistics as the relative frequency of a given letter, of, 
a given n-gram, transition frequencies, and frequencies with 
whioh letter i is followed by letter i at e distance n are all 

• •• • 

We will describe a system in which every invariant 
statistic which the cryptanelyst can construct from the (infinte) 
intercepted E is independent of both K and M, and thus gives no 
information to him. This effect and still more occurs with the 
ideal ciphers of course, but here it is obtained independently of 
the original message statistics and without any matching of the 
cipher to the language. 

Let N be a "random" sequence of letters; 

N * »•» n_ 2 n -i n n^ n 2 u s ... 

this is supposedly a known sequenoe (to the enemy) and thus a 
part of the system, not of the key. Apply eny simple cipher to 
the message and then add N letter by letter to the result {mod 
B6)« The ♦•sum'* is the enciphered message* 'it is evident that 
any Invariant statistic oa S will be (with probability 1) -the that for a rendom sequence* Hence it is Independent 
of both K and M» ; x • 

We need hardly add that such a system is easily 
broken ~the enemy merely subtracts N from E and then solves 
the simple residual cipher* which 'may often be done with 
invariant statistics, > 

Appendix 6 

Maximum Repetition Rate in Compound Systems for a Given To- 

We consider briefly the question of how to arran- 
component periods in a compound Vigene're or Transposition i 
to obtain the longest period for a given total key size, 
component periods are Px, P2,/ t *» Sg JLt is clear that they 
b'e co prime. Otherwise the total key, which is LP if could \ 
duoed without changing the period, which is the least comm; 
multiple of the Pi, merely by deleting a factor which appet 
several o'f. the P^ from all but one/ Also each p must be e 

of a prime, for if it contains two primes, it can be divide 
these parts, reducing the key and not affecting the period, 
the component periods are selections from the series of pri 
and powers of prime sj . . 

4& 2„ 3, 4, 5, 7, 8, 9\ )^:XZ 4 ?m:i7' f , 19, 23,. 25,. 27, 

the seleotion being pairwise ooprimeV 

It appears from empirical evidence that the best 
of component periods, for a given total size S is found by t 
following process, 

1. Determine the largest M such that Ipj<S where the 
are the primes in increasing order^ This is the 
maxi m u m number of periods where the periods are c 
prime, end is the number of periods to be used. 

2. Choose from the sequence A, M elements, consecuti 
except for the fact that no prime is represented 
than once, the M elements being as great as possi 
with aum <S# 

3. If the aum is <s move as many as possible of the 
elements in this block up -a notch in the sequence 

v still satisfying .the conditions .on the sum and co 

' ■ mality , ■ : i r •' 

4. Repeat 3 to either part of the original block if 

, , * :." sible •*• "This process eventually ends and apparent 
gives', the proper decomposition* 

■ ; *-' : ~ >! '" : 

r -?. For example with 8 » 50^ the .sum of the first 
primes is 41, of the first 7 is 58. Hence 6 peri 
will be used. We .have 

• • 11 + 9, + 8+ 7+ £ + 3w43 

13 + 11 +9 + 8 + ^7 + 5 * 53 

hence we start with the block 11, 9, 8. 7 5 3 
to 6 givl * elemants 11 » 9 » 8 ' 7 . can be up a 

13+ 11 +9+8+5+3-49 

Nj further improvement seems possible, we obtain 

F- 13X 11 x 9 x 8x 8 x 3 * 154, 440 
The products and sums of the first n prime's are given below 

n 1 £ 3 4 5 ... 6 7 8 

pn , 2 3 5 7 11 13 17 19 

Sum 2 ■ 5 10 17 28 " , 41 * 58 77 

Product 2 6 30 210 2310 30030 510510 9699590' 22309! 



Figures .1-25. 




ME55A G 











FIG. 6 

* >- 


T" 1 

FIG. 8 











C 3 [ M 7 

] c; 


FIG. 10 


FIG. 16 


FIG. 19 




FIG. 20 

FIG. 2 2 

FIG. 23 

September 19 , l*4&-ll£S-CX3-yO 

Introduction . 

la elasaioel ae&aanios one considers situations 
where the state of a syatoa is described bj i Mt of numbers, 
tie coordinated of the phaae space of the system, and the 
dynamical behavior la controlled by a eat of ordinary differ- 
antlal equations. Suca a ays tea is entirely determinate; the 
future ia completely apeolfiad by toe preaent state aad the 
dynamical equations, alnoe these differential equations have, 
ia general, a unique eolation peas lag through a gives point. 

In other branches of physics (host flow, brown! an 
motion, diffusion etc) there are situations which saa ha called 
completely statistical* The path of a particle of gas la 
described only statistically aad no/ determinate or mesa behsrior 
ocoars. In this case oae studies the flow of probability which 
ia described by a partial differential equation of the heat 
flow typo. 

the present stomoraadnm J I sens sea a partial diff area- 
tlal equation ia which both effects occur— there is a definite 
•mean" motion of a system determinate ia character, carrying 
its rcpresentatlTC point through phase space la the classical 
manner with a superimposed statistical effect continually per- 
turbing it from this path. 

• a - 

2a suoa a mm toe futars coordinates of tbs aysteas 
•uuot bo precisely predicted; oaly « probability distributioa 
fuaoUoa oaa be deterained for tha future tiae aaose *alae 
times tli« volww eleaeat dT is tae probability tbet tae ayatea 
will m la ibt wolaa* eleaent dr around tae poiat la question. 
For a snort tlaa tne ays tea is substantially deteralnata , tbs 
dlatribatloa being concentrated around a point whleb morm* ao- 
aordlau to tae determinate part of tae equation. As tba statis- 
tical off acta ooaa into play this distribution broadens oat aad 
la general approaabea a Halting distributioa anion ia indepen- 
dent of tbe initial atato of tbs systeau 

Xa eoac rasps ota taa situation ia s ta l ls* to tbet la 
quantua aeebaalsa, wbere aysteas are dsseribad only by probnbili- 
tiea (or wore praaisaiy by wm foaatlons whose squared aaplitudas 
ara probabilities*. Tbara is tais difference howeTcr; ia quantum 
mechanics area tae initial state aaaaot be preoiaely deseribed 
due to tbs aaeertaiaty priaeiple. Coajaeate ▼eriablea aaaaot 
both be measured elaultaaeousiy vita exactness. Za tae aysteas 
we consider Hera there are asaaaed to be no dlffioulUes of this 
aeture— all ooor dins tae aaa be aiaaltaaeoualr aad preeiaely 
measured, tais eorrespoads to tae differ ease la tae fundamental 
equation from that of qusataa Aeehsaioe~Sebm,edlagoits equation is 
for the wave fuaotion * , walla tae equation considered bare deals 
directly «itfc tae probability density, mas the present work: is 
adapted to "ifolar" statistical situations. 

Ihln sort of analysis any *>* expected to apply to 
many pr obi eat where the actual situation Is quits explicated 
but a partial theoretical aaalysic is possible, this partial an- 
alysis Is used for the determinate part of tbs c;u»tioa, and 
the other complex disturbing effects treated statistically, 
each situstions may occur la economics, sociology, history, eta. 
as veil as in many engineering and physios J. problems. 

G. S. Stlbits la a series of meaoraada bas considered 
a similar problem la aonaeotioa with the stability of a periodically 
closed servo ays tea. la ale case the phase space of the system 
oonslsted of a sat of discrete points, and uie fundamental 
equation is a difference equation, la the case considered here 
(which was suggested by Stlbits* eora) the variables are continuous 
and a differential equation is involved. S 

Xa a Aataraiaate *ja\*m aita aa a dlaaaaloaai paaaa 
OMi, nacaa aotioa la iMtriM bar diffaroatial asuatioaa, *• aa*a 

jgi • fYu\ **, .... **) 1 * X # * a <D 

vbara taa x* ara ©oordLoate* la taa paaaa apaea *ad t ia tin*. 

If aa a tart wita * probability diatributioa of poiat* ia paaoa apaoa 

.... **, t) 

giving taa probability daaalty ia tsa differ aatiai rain** «lta«at 

about at 1 . .... a* at tiaa t, taia dlatributfcm cfaaa«f>a adta tin*. 

■ * 

lt» utloa la 4»»orll>»a b» tM ftrUH 41ff«r«sU«i •}u»Uoa 

or ia taaaor aotatioa 


Taia ia oTidoat If »• taia* of ? aa a fluid daaaity uaoaa Yaloaity 
flald ia f 4 . 

So* auppoaa taat aa t&* raaraaeautiva poiat of too 
ayataa aovaa about taa pftaao apaaa it ia ooatinaaily aubjaat to 
aaOl dlatorb&aeaa, walah ar« of a probability ty?a« tlaia taa 
ayataa taada to folio* taa aoluUoa of (1) but ie aoatiaaally 
balac dlaturbad by taa probability affeota, walca amy bo taouaat 
of aa aoaathlag liJca aolaaular aoUiaioaa of taa aurrouadia* ama 

m % m 

oa a aorta* partlelo. *o art Ui«rtitt4 la taa lioltla* •*»• 
abort taa dltturbiat; tffoota are wp rapid tout T*rj aaall. If 
we eeeuao that taa &ata*aeaee 1* aa»o«taeottt aaa Isotx-oplt, 
tfela eta bt rtpreeeate* ay as afldltloaal tara la taa equation of 
tao aeet flow typo 


Za tao aort gen*?el oaoo ear tela dlreetloa* 007 00 jr of erred, aad 
oortalo reslona may aave ereattr partarbatloa effaote« taus taere 
•111 generally b« * esaU ellpasld of probability about oaoa point. 
aa4 o oorroopoflcioa poeltlve aefiaite ejiadrntio for* 

defined erer toe paa*e apeee* Tbli form deeerlbee tao Xoeal 
•tetletleal perturbine effeets, for eeea point, 
tao equation tata enauaee tao form 

Talt partial differential eonetioa «©wae tao flo* of probability 
la tao panee tpeee, Utb oa eaeeable of eyatene dlatribated at 
t m aoooraUa to F (a l ) 

tao attribution at a la tar tlao t^ la tao eolation of (1) for 

Tao equation (1) la llaoar aad of parabulia typo (la t). 
In taa x* it le elliptleel, aiaea a 1 ^ la fOaltlra definite. 

m % m 

Tao total .robubiUtj la tU jftaao 0j*«* *«asia o^staai, for if 
vt lot 

/ (a 1 * 5^ ♦ *« • « 

tfco latogral boia* ow o * xffUi*aUy Xar*o oarfaoo, ud ^ t&o 
volt awaalt 

Xf a 1 * to aosltivo oafiaito «o4 oota a 1 ** aa* 
ar« ooatUwotui la tao aaaao aaaoo turn 4iatri»«tioa v approaaM 
a ual$*o Halt as t HMK ma Halt la alia«r s«o owr*a«*o t 

tao pNfesalUty JOtaroaUa* to Uf laltf o* a «o*iatt« Uaitiag 4i#- 
tritouoa r* alta . 


ft* aay %• 

f*a iiaitiaa alatritottloa am*t aatlofjr tao olU#tioal 
ofuatloa ottaiaoa ay oottla* || • 0, 

To nuom tact the aiitrihution epproaohea a Halt let 
P 1 and ?g ee two different solution* of ID. Titea the dif- 
ference o, - ? A - P^ al«o satiafia* the equation aad ^ la 
poaltive la oaa region B and negative la tae raaaladar at tae 
apace. Consider tae cuani-ity 

U auat deer ease for 

where S la tae surface of tae reeioa B aad T la tae outward 
Telooity of tale ear face. Since Q vanishes an the surface, tae 
aeooad tern la aero, aad tae first la 

Toluae iategrale of diYeraaaeea aad traaafora aj tae 


usual theorems lato surface integrale 


tae aeooad tera age la vanishes alace Q - on S. la tae first 
term « A la la tae direction of ^ a© at any point we have 


Tims a aj initial distribution 

? a «4 ?j H dearaaaia«. 
•BprMMM t*» MM Xiait. 


• I I* 

It «^ is SeuiUiMOOS, *ftt tots ft <U»«aatHuiUy t 

PwiH b#> o&u lienors, sad tfcs ▼sotor SUE ftl— aa i t— tsassj » 

Ths saouat of tiiia di««oatiault/ Is £U «& fcy 
ft 1 * - ?j) • - If* - ?*) » 

*frtr« tht b***sd «a4 uafcsjrr »d l«n«r* ***** ts> ti»« two tide* 
of t&« dltesoiiiuUt/. Tims 

SMMyiftlsai Aft Mm *»a i1£m o# s*sft i 1 nana ** ****** g> gj - 

Xft tSM sUpisst Oft« &l»«ASiS*%l •*»* wft fcm 

If wo «tort with ft «opiko* of prooaoilitr ioaaUaoa 
at oao point, ta« I— tllato aoaowiar aaa bo aaaarlaoa la oittjOo 
tor a*, aoar talt poUt wa **r ohaaao a 1 * aad f 1 to bo aoaotaat. 
Do» to tao f 1 tao aolxo otartt «crln« vita a ▼•lojUy/*, 9111141 
too pro»«oUltr tors a 1 * •pr«*de it out. If wo oottt wUtt«i 
fro* af to 

wo aooo - 

* ' „ „. - "' 

aod too •quatioa boaoaoa 

taio ia tha o^uatioa far aoat flaw la aa aaiootropla Bodlua. 
Thai ia ftao y* aooraiooto too «»i*o dlffaooa out lata a m wu&m 
al»%rlb*tlaa *ita qoaArotU form a**| for th« firot afcort iatorroi 
of tiaa 

waoro A. « it tao laroroa fora of a 1 * 

feliaa Toioauy rial* gaj aom^aaaaoaa at*u«ti«ai .wta. 

Om portioalo? mm of la tor tot 1* ttei la w&iaa 

is tUo opooo. ?at a oao a&aooslaaal aaaaa opoco,tfeo a$uatlaa U) 
taaa aaoaaao ta« faxa 

A coaoxal solution far tola o*§o &«s *soa foa&u It a*? *o dosaria aa 
*a mxoi>a* It wae laltlol 41*t*iteatl©a i» a s foactioa, aa taa 
sjrataa (or 0^aeabJL«) ia fcaooo to aaaa a daflalta talus at x at 
t * 0, say P $ taaa at \± taa diatribe Uoa is aoraal* ?ao saatax 

aM^a aa^MP ^^^W^ft^^rd IsV^^^aa^aV^^Oj^ ^9 s^-$ jjj^L^WW^ 

Taus taa attn £ oaroaaas al aa a, taa ium suits aa taa aystoa aaaid 
follow am taa atatiattaal sff oata aasaat* Hm tarlaaaa a* 
iaoraaaoa axyaaaatiaUy to a Ualtia* taiaa a/a aita aalf taa tlaa 

to ay ova taat taia la taa aalatlaa it la oaly aaaosaojy 
to saastitats la taa oqoatiea (*) , k* t —a* too tiatrisatloa 
approaafcaa a normal aao saatarad aa ««ro ultn a* « a/a* 

M • |U - of*) 

«* » $ (1 • O****) 

«iu oa oroitrarr iaitioi aiotritaUoo ? a U) too oolottoo ono bo 
written *• ma mte*r&l ««lo« U&* aotooo of lu^iiUm of keo* 
flow 9robl«gt»« 

• / **m * 

foe eeoe teaerol rooolto aoX4 la toe I aioo ae ioaal 
I*hi wh*$i it i ltft»»y fere *&d e^ 1a eooetnat* A *OollEO # 
of probability eroo&eaa iAte o oorool Aletrleotleo* toe ooefte* 
folio* la* tfit dtlsrslMU trejeetery oad toe qooArotlO for* 
vfeloh tekeo toe jtliot of the etaoaor4 eOriatloo toMMNOooi eat* 
oeoeatioUy towt o eef Ulte limit. *ae eveloeties of too 
e one tea to io obob aero eoopUeat** 1* tale eeoe oeeew, ftoe 
eeootlooe for too fiaal aietrloaUoe oro *i*eo io too ejeeodis« 

Xt la t&t oao Alaoaoloaal llaaa* aaoo «• rtwt alta a 
aoxaal 4lat*l*atioa aaatoroa oa ao*o aita a* • £ , tao distriOuUe* 
hm am ttftttjr alta t&« Xoxm. Aa io&iTi&ual oyttoa oxoaotaa 
•totlotioal aoUaa aooot aoro aaa tao oaaaablo of »jst*m* prodoooo 
aa oaaoaalo of tiao oarloa. Tail mmiU* aaa b« oooa to ao 
oaultaloat to taoraal aoiao waiea aaa oooa p*»»ed tirou^a a t Utor 
with troa»f«r aaaxaoterlotla 

loa&lag to a po»or opaotrua for ta* aoloo 

To aaow tola, tao aatoaoxrolatioa aa/ oa o*icul*t«a, Urotoaa 
vaooo vaXuo at t • !• P aato a aoraai distribution oaatataA 

m * t^ ia 

Aiotriootioa at t * 4 la aoeraal vita a § • J . 

aaA tala ia too autoo jxrolatioa. 

too power apootnta la tao laavia* taraaafon at aula 



cystic ^^^^^^^^oa -^x .-^n.. 


ft • JLfftf*} ft ♦ *(*) F) 

#% OX 9* 

mix) t 0. la **• »t4»4y «t*t« 

*UJ f* ♦ *(x) * • 
twadBi ?, «* x «*» ± • * o 

*U) 1 fix) p • o 

I * 1 1*1 

A 1» A«t«ralA*4 V *&• •o&AiUaa |p 

ttmMi it is *•*•*»•*? /tlx) ii 

fix) »> • 

f (x) • x< • 


• IS* 

»t obt&U ft* **• ma •tatloattry oolutioa 

•V 1 * - ' . 

^ s-*M 

- .« 

of «x?oa«aUftl« 6««?«ftftl&£ lot»4 * «. 

*&6 I* wtwttsl 

fte satisfy dp • o »• tfc* 
this v««>1ym 

•a* *1m» 


By R. R. Rlackman, H. W. Rode, and 
C. E. Shannon ■ 

THE problem of data smoothing in fire con- distant airplanes. Suppose, for example, that 
trol arises because observations of target in observing the target's position we make two 
positions are never completely accurate. If the errors of opposite sign and a second apart, of 
target is located by radar, for example, we may 25 yards each. Then the apparent motion of 
expect errors in range running from perhaps the airplane is in error by 50 yards per second. 
10 to 50 yards in typical cases. Angular errors Since the time of flight of an antiaircraft shell 

may vary from perhaps one to several mils, 
corresponding at representative ranges, to 
yardage errors about equal to those mentioned 
for range. Similar figures might be cited for 
the errors involved in optical tracking by vari- 
ous devices. Evidently these errors in observa- 
tion will generate corresponding errors in the 
final aiming orders delivered by the fire-control 

A data-smoothing device is a means for mini- 
mizing the consequences of observational er- 
rors by, in effect, averaging the results of ob- 
servations taken over a period of time. The 
simplest example of data smoothing is fur- 
nished by artillery fire at a fixed land target. 
Here the principal parameter is the range to 
the target. While individual determinations of 
the range may be somewhat in error, a reliable 

in reaching its target may be as high as 80 
seconds or more, such an error might produce 
a miss of the order of 1 mile. It is clear that 
in any comparable situation the effect of ob- 
servational errors in determining the target 
rate will be much greater than the position er- 
ror alone would suggest, and the function of 
the data-smoothing network in averaging the 
data so that even moderately reliable rates can 
be obtained as a basis for prediction becomes 
a critically important one. 

Aside from magnifying the consequences of 
small errors in target position, the motion of 
the target complicates the data-smoothing 
problem in two other respects. The first is the 
fact that it gives us only a brief time in which 
to obtain suitable firing orders. The total en- 
gagement is likely to last for only a brief time, 

estimate can ordinarily be obtained by taking and in any case it is necessary to make use of 

the simple average of a number of such ob 
servations. This example, however, is scarcely 
a representative one for problems in data 
smoothing generally. The errors involved are 
small and the averaging process is an elemen- 
tary one. Moreover, the data-smoothing proc- 
ess is not of very decisive importance in any 

the data before the target has time to do some- 
thing different. Thus the averaging process 
cannot take too long. The second complication 
results from the fact that the true target posi- 
tion is an unknown function of time rather 
than a mere constant. Thus many more possi- 
bilities are open than would be the case with 

case, since any errors which may exist in the fixed targets, and the problem of averaging 

estimated range can normally be wiped out 
merely by observing the results of a few trial 

More representative problems in data 
smoothing arise when we deal with a moving 
target. In this case errors in observational 
data may be much more serious, since they 
determine not only the present position of the 
target but also the rates used in calculating 
how much the target will move during the time 
it takes the projectile to reach it. An illustra- 
tion is furnished by antiaircraft fire against 

• Bell Telephone Laboratories. 

to remove the effects of small errors is cor- 
respondingly more elusive. 

The intimate relation between data smooth- 
ing and target mobility explains why the data- 
smoothing problem is relatively new in war- 
fare. The problem emerged as a serious one 
only recently, with the introduction of new and 
highly mobile military devices. The airplane is, 
of course, the archetype of such mobile instru- 
ments, and we have already mentioned the 
data-smoothing problem as it appears in anti- 
aircraft fire. Since the relative velocity of air- 
plane and ground is the same whether we sta- 
tion ourselves on one or the other, however, the 




mobility of the airplane produces essentially 
the same sort of problem in the design of bomb- 
sights also. Another field exists in plane-to- 
plane gunnery. Although they are somewhat 
slower, the mobility of such vehicles as tanks 
and torpedo boats is still considerable enough 
to create a serious problem here also. Future 
examples may be centered largely on robot 
missiles. It is interesting to notice that a 
guided missile may present a problem in data 
smoothing either because it belongs to the 
enemy, and is therefore something to shoot at, 
or because it belongs to us, and requires 
smoothing to correct errors in the data which 
it uses for guidance. The tendency to higher 
and higher speeds in all these devices must 
evidently mean that fire control generally, and 
data smoothing as one aspect of fire control, 
must become more and more important, unless 
war making can be ended. 

Very mobile instruments of war, such as 
the airplane, began to make their appearance 
in World War I, but there was insufficient time 
during that war to make much progress with 
the fire-control problems which such instru- 
mentalities imply. In the interval between 
World War I and World War II, however, a 
considerable number of fire-control devices, 
such as bombsights and antiaircraft compu- 
ters, were developed. The principal attention 
in the design of these devices, however, was 
on the kinematical aspects of the situation. 
Although a number of them included fairly 
successful methods of minimizing the effects of 
observational errors, b it seems fair to say that 
in the interval between the two wars there 
was no general appreciation of the existence of 
the data-smoothing problem as such. 

It follows that the theory of data smoothing 
advanced in this monograph is the result prin- 
cipally of experience gained in World War II. 
More specifically, it is the product of the ex- 

* Most of these solutions depended upon the use of 
special types of tracking systems. Examples are found 
in the use of regenerative tracking in bombsights and 
antiaircraft computers or in the determination of rates 
from a precessing gyroscope or an aided laying mech- 
anism in an antiaircraft tracking head. So far as their 
effect on the data-smoothing characteristics of the 
overall circuit is concerned, these devices are equiva- 
lent to simple types of smoothing networks inserted 
directly in the prediction system. This is discussed in 
more detail under the heading "Exponential Smooth- 
ing," Section 10.1. 

perience of the authors with a series of proj- 
ects, largely sponsored by Division 7 of NDRC, 
concerned with the design of electrical antiair- 
craft directors. In addition, it draws largely 
on the results of a number of other investiga- 
tions, also NDRC sponsored. The possible key 
importance of data smoothing in the design of 
fire-control systems was recognized by Division 
7 early in the course of its activities and the 
emphasis placed upon it in a number cf proj- 
ects led to the accumulation of a much larger 
body of results than nJght otherwise have been 

Data smoothing is developed here in terms 
of concepts familiar in communication engi- 
neering. This is a natural approach since data 
smoothing is evidently a special case of the 
transmission, manipulation, and utilization of 
intelligence. The other principal, and perhaps 
still more fundamental, approach to data 
smoothing is to regard it as a problem in sta- 
tistics. This is the line followed in the classic 
work 1 by Norbert Wiener/ For reasons which 
are brought out later, Wiener's theory is not 
used in the present monograph as a basis for 
the actual design of data-smoothing networks. 
Because of its fundamental iaterest, however, 
a sketch of Wiener's theory is included. The 
authors' apologies are due for any mutilation 
to the theory caused by the attempt to simplify 
it and compress it into a brief space. 

The present monograph falls roughly into 
two dissimilar halves. The first half, consist- 
ing of the first three or four chapters, includes 
a discussion of the general theoretical founda- 
tions of the data-smoothing problem, the best 
established ways of approaching the prob- 
lem, the assumptions they involve, and the 
authors' judgment concerning the assumptions 
which best fit the tactical facts. In this part 
may also be included the last chapter, which 
contains a fragmentary discussion of alterna- 
tive data-smoothing possibilities lying outside 
the main theoretical framework of the mono- 

The rest of the monograph is concerned with 
the technique of designing specific data-smooth- 
ing structures. A fairly elaborate and detailed 
treatment is given here, in the belief that the 

• Wiener is also responsible for providing tools which 
permit the gap between the statistical and communica- 
tion point* of view to be bridged. 




problem of actually realizing a suitable data- 
smoothing device is, in some ways at least, 
as difficult as that of deciding what the general 
properties of such a device should be. The 
technique, as given, draws heavily upon the 
highly developed resources of electric network 
theory. For this reason the discussion is 
couched entirely in electrical language, al- 
though the authors realize, of course, that 
equivalent nonelectrical solutions may exist. 
For the benefit of readers who may not be 
familiar with network theory, the monograph 
includes an appendix summarizing the prin- 
ciples most needed in the main text. 

Two further remarks may be helpful in un- 
derstanding the monograph. The first concerns 
the relation between data smoothing and the 
overall problem of prediction in a fire-control 
circuit. These two are coupled together in the 
title of the monograph, and it is clear that the 
connection between them must be very close, 
since, as we saw earlier, small irregularities in 
input data are likely to be serious only as they 
affect the extrapolation used to determine the 
future position of a moving target. In the 
statistical approach, in fact, data smoothing 
and prediction are treated as a single problem 
and a single device performs both operations. 

In the attack which is treated at greatest 
length in the monograph a certain distinction 
between data smoothing and prediction can be 
made. To simplify the exposition as much as 
possible, the explicit discussion in the mono- 
graph is directed principally at data smooth- 
ing. This, however ( is not intended to suggest 
that there is any real cleavage between the 
two problems or that the analysis as developed 
in the monograph does not also bear, by impli- 
cation, upon the prediction problem. Any the- 
ory of data smoothing must rest ultimately 
upon some hypothesis concerning the path of 
the target, and the exact statement of the as- 
sumptions to be made is in many ways the most 
important as well as the most difficult part of 
the problem. The same assumptions, however, 
are also involved in the extrapolation to the 
future position of the target. It is thus impos- 
sible to solve the data-smoothing problem with- 
out also implying what the general nature of 
the prediction process will be. For example, 
the formulation given in Chapter 9 amounts to 

the assumption that the target path is specified 
by a set of geometrical parameters correspond- 
ing to components of velocity, acceleration, etc. 
The data^smoothing process centers about the 
problem of obtaining reliable values for these 
parameters. To obtain a complete prediction 
thereafter, it is merely necessary to multiply 
the parameter values thus obtained by suitable 
functions of time of flight and add the results 
to the present position of the target. 

The other general remark concerns the tacti- 
cal criteria used in evaluating the performance 
of a data-smoothing system. This turns out to 
be one of the most important aspects of the 
whole field. It is assumed here that the tactical 
situation is similar to that of antiaircraft fire 
against high-altitude bombers in World War 
II. The defense can be regarded as successful if 
only a fairly small fraction of the targets en- 
gaged are destroyed. On the other hand, the 
lethal radius of the antiaircraft shell is so small 
that it is also quite difficult to score a kill. 
Under these, circumstances we are interested 
only in increasing the number of very well 
aimed shots. 

When we combine these assumptions with 
the path assumptions described in Chapter 9 
we are led to the data-smoothing solution for- 
mulated here, in preference to the solution ob- 
tained with the statistical approach. On the 
other hand, we might equally well envisage a 
situation in which the target contained an 
atomic bomb or some other very destructive 
agent, so that it becomes very important to 
intercept it, while the lethal radius of the anti- 
aircraft missile is correspondingly increased, 
so that great accuracy is not needed for a kill. 
In this situation our interest would be focused 
on the problem of minimizing the probability 
of making large misses, and the solution fur- 
nished by the statistical approach would be ap- 
proximately the best obtainable." 1 

" In fairness to the statistical solution it should be 
pointed out that it is also the beat obtainable, without 
regard to the lethal radius of the shell, if we replace 
the path assumptions made in Chapter 9 by a "random 
phase" assumption. The path assumptions in Chapter 
9 are almost at the opposite pole from a random phase 
assumption, and represent a deliberate overstatement, 
made in order to illustrate the theoretical situation as 
clearly as possible. 


Chapter 7 


ONE of the principal difficulties in any 
treatment of data smoothing is that of 
stating exactly what the problem is and what 
criteria should be applied in judging when we 
have a satisfactory solution. It is consequently 
necessary to embark upon a rather extensive 
general discussion of the data-smoothing prob- 
lem before it is possible to consider specific 
methods of designing data-smoothing struc- 
tures. This preliminary survey will occupy 
Chapters 7, 8, and 9. As a first step this chap- 
ter will describe two of the general ways in 
which the data-smoothing problem can be ap- 
proached mathematically. The formulation of 
the problem which is finally reached in Chap- 
ter 9 is not the one which is most obviously 
suggested by these approaches. This, however, 
does not lessen their value in characterizing 
the problem broadly. 



In an actual fire-control system the data- 
smoothing problem is usually made fairly spe- 
cific because of the particular geometry 
adopted in the computer. It may be helpful 
to have some particular case in mind as a 
touchstone in interpreting the general discus- 
sion. For this purpose the most appropriate 
example is furnished by long range land-based 
antiaircraft fire, since most of the analysis 
described in this monograph was developed 
originally for its application to this problem. 
It is usually assumed in the antiaircraft prob- 
lem that the target flies in a straight line at 
constant speed, and in one case at least the 
computer operates by converting the input data 
into Cartesian coordinates of target position 
and differentiating these to find the rates of 
travel in the several Cartesian directions. 
These rates form the basis of the extrapolation. 

The process is illustrated in Figure 1. The 
input coordinates are transformed into elec- 
trical voltages proportional to x P , y,., and z r , 
the Cartesian coordinates of present position, 

in the coordinate converter at the left of the 
diagram. The extrapolation for * is shown 
explicitly. It consists essentially in differen- 
tiating to find the x component of target 
velocity, multiplying the derivative by the time 
of flight t f and adding the result to x P to find 







j 1 




»ZIU / 

Figure 1. Dat 
diction circuit. 

x F , the predicted future value of x. A similar 
procedure fixes y r and z r . After the addition 
of certain ballistic corrections, these three co- 
ordinates of future position are transformed 
into gun aiming orders in the coordinate con- 
verter shown at the right of the drawing. This 
last unit also provides the time of flight re- 
quired as a multiplier in the extrapolation. 

The small irregularities in the input data 
caused by tracking errors are greatly magni- 
fied by the process of differentiation. It is thus 
necessary to smooth the rates considerably if 
a reliable extrapolation is to be secured. The 
data-smoothing network for the x coordinate is 
represented by JV, in Figure 1. Since the Car- 
tesian velocity components are theoretically 
constants if the assumption of a straight line 
course at constant speed is correct, a data- 
smoothing network in this computer must be 
essentially an averaging device which gives 
an appropriately weighted average of the fluc- 
tuating instantaneous rate values fed to it. The 
problem of "smoothing a constant" is given 
special attention in Chapter 10. Aside from the 
particular circuit of Figure 1, we may, of 
course, be required to smooth a constant when- 
ever the prediction is based upon an assumed 
geometrical course involving one or more pa- 
rameters which are isolated in the circuit. 





In addition to smoothing the rates we can, 
if we like, attempt to smooth the irregularities 
in present position also. A network to accom- 
plish this purpose is indicated by the broken 
line structure N a in Figure 1. Of course, in 
dealing with the present position we are no 
longer smoothing a constant, but suitable struc- 
tures can be obtained by methods described 
later. However, the effect of tracking errors in 
the present position circuit is so much less than 
it is in the rate circuit that N 2 can generally 
be omitted. 

Geometrical assumptions of the sort implied 
in Figure 1 are helpful in visualizing the prob- 
lem, and they are of course of critical impor- 
tance in determining what the final data- 
smoothing device will be. It is important not 
to make explicit assumptions of this kind too 
early in the formal analysis, however, since 
the meaning of such assumptions is one of the 
aspects of the general problem which must be 
investigated. For example, it is apparent that 
no airplane in fact flies exactly a straight line, 
nor flies a straight line for an indefinite period. 
In detail, the solution of the data-smoothing 
problem depends very largely on how we treat 
these departures from the idealized straight 
line path. For the present, consequently, it will 
be assumed that the input data are presented 
to the data-smoothing and predicting devices 
in terms of some generalized coordinates, the 
nature of which we wjll not inquire into too 
closely. A given coordinate might, for example, 
be a velocity, a radius of curvature, an angle of 
dive or climb, or any other quantity which 
would be directly useful in making a predic- 
tion, or it might be a simple position coordi- 
nate such as an azimuth or an altitude. 

The data-smoothing and predicting opera- 
tion itself is assumed to be performed by linear 
invariable devices. Aside from the fact that 
this assumption is, of course, a tremendously 
simplifying one, it also fits the data-smoothing 
problem very nicely, as the problem is formu- 
lated in this chapter. With other formulations, 
however, it appears that somewhat better re- 
sults may be obtainable from variable devices 
or devices including more or less radical 
amounts of nonlinearity. These possibilities are 
discussed briefly in Chapter 14. 


Figure 1 illustrates a distinction between 
two possible methods of looking at the data- 
smoothing problem which it is advisable to 
establish for future purposes. In describing the 
x system in Figure 1 we laid emphasis on the 
particular networks N, and N s . It is clear, how- 
ever, that the complete x circuit with input x, 
and output x F is a network having overall 
transmission properties which can be studied. 
Since t, will normally vary with time, the net- 
work is not, strictly speaking, an invariable 
one, but the changes of t, are ordinarily too 
slow to make this an essential consideration. 

When it is necessary to make a distinction 
between these points of view, a network such 
as N x , which is merely an element in the pre- 
diction process, will be called a data-smoothing 
structure. An overall circuit, providing data 
smoothing and prediction in one step, will be 
called a data-smoothing and prediction net- 
work, or simply a prediction network. Al- 
though these points of view have been illus- 
trated for rectangular coordinates, they obvi- 
ously apply also in many other situations. For 
example, we might go so far as to apply the 
overall point of view to a complete circuit from 
input azimuth, say, to output azimuth. 

Both points of view are taken from time to 
time in the monograph. When possible, how- 
ever, principal attention has been given to the 
limited data-smoothing problem. This tends to 
simplify the discussion, since the limited prob- 
lem is evidently more concrete than the overall 
prediction problem. Moreover, it permits us to 
deal lightly with such questions as the particu- 
lar choice of coordinates in which the smooth- 
ing operations are conducted, since it assumes 
that the general kinematical framework of pre- 
diction has already been decided upon. On the 
other hand, the overall point of view is more 
effective in certain situations, and it is the only 
natural one in the statistical treatment de- 
scribed in the next section. 


The most direct and perhaps the most gen- 
eral approach to data smoothing consists in re- 




garding it as a problem in time series. This 
is the approach used by Wiener in his well- 
known work. 1 It essentially classifies data 
smoothing and prediction as a branch of statis- 
tics. The input data, in other words, are 
thought of as constituting a series in time 
similar to weather records, stock market prices, 
production statistics, and the like. The well- 
developed tools of statistics for the interpreta- 
tion and extrapolation of such series are thus 
made available for the data-smoothing and 
prediction problem. 

To formulate the problem in these terms, 
let fit) represent the true value of one of the 
coordinates of the target and let git) repre- 
sent the observational error. Then fit) and 
git) are both time series in the sense just 
defined. The set of all such functions corre- 
sponding to the various possible target courses 
and tracking errors form an ensemble of time 
series or a statistical population. One can im- 
agine that a large number of particular func- 
tions fit) and git) have been recorded, each 
with a frequency proportional to its actual 
frequency of occurrence. Wiener assumes that 
they are stationary, that is, that the statistical 
properties of the ensemble are independent of 
the origin of time. This, of course, implies that 
both functions exist from t = — co to i = + co . 
We will sometimes find it more convenient to 
make the assumption that the two functions 
vanish after some fixed, but sufficiently remote, 
points on the positive and negative real t axis.* 

The input signal to the computer is of course 
fit) + git). If we assume that the coordinate 
in question represents a position, the quantity 
we wish to obtain is fit + t,), where t, repre- 
sents the prediction time. If the coordinate is 
a rate, we are interested in an average value 
of f(t) over the prediction interval. This com- 
plicates the mathematics somewhat, but does 
not essentially affect the situation. 

» This is done for technical mathematical reasons. We 
ahall later have occasion to consider the Fourier trans- 
forms of f(t) and 0(f), and, to have well-defined trans- 
forms, the integrals of the squares of the two func- 
tions, from t - - co to t = + <o , should be finite. This 
would not happen under the "stationary" assumption. 
Wiener avoids the difficulty by introducing what he 
calls a generalized harmonic analysis, but this method 
is far too complicated to be treated in a brief sketch 
like the present. 

We shall not, of course, be able to predict 
fit+t f ) perfectly accurately. Let the pre- 
dicted value be represented by f*it + t,). In 
virtue of our assumption that the data- 
smoothing and prediction circuit is to be a 
linear invariable network, the relation between 
f*{t •¥ t,) and the total input signal fit) 
+git) can be written as 

/*(< + </) = / \M + gi<r))dK( a ) (1) 

where dKia) represents the effect of the data- 
smoothing and prediction circuit. Comparison 
to equations (2) and (5) of Appendix A shows 
that K is, in fact, the indicial admittance of 
this circuit. The particular problem to be 
solved is of course that of finding a shape for 
the function Ki<r) which will make + t,) 
the best possible estimate of fit + * f ). 

The fact that the upper limit of integration 
in equation (1) is taken as a = is particu- 
larly to be noted. It corresponds to the fact that 
in making a prediction we are entitled to use 
only the input data which has accumulated up 
to the prediction instant. This restriction will 
be conspicuous in the next chapter, where the 
time-series analysis is completed. 


The principal statistical tool used in study- 
ing equation (1) is the so-called autocorrela- 
tion. Under the "stationary" assumption the 
autocorrelation for fit) is defined by 

* i(t) = g$*hf-T w*«w>*. (2) 

We can obtain a normalized autocorrelation, 
which is more convenient for some purposes, 
by dividing by </>,(<>)• This gives 

C f(l+r)fit)dt 
, , \ <t>\ir) .. J-t 

*" (t) = *m - Ay. ~r • « 

J T 1/(0 J' dt 

If we assume that fit) in fact vanishes for 
sufficiently large positive or negative values of 
t, the limit sign can be disregarded and e> lAr ( T ) 
becomes simply 



0,v(r) - ffrj fit + T )f(t)dt (4) 

( / (ty^dt and represents the total 
"energy" in the time series f(t). 

Precisely similar expressions can be set up 
for the autocorrelation <f> 2 ir) or <j> 2K (r) of the 
observational error function git). In a gen- 
eral case we might also have to worry about 
a possible cross correlation between fit) and 
g(t). This would be represented by a cross- 
correlation function <£ 12 (t), obtained by inte- 
grating the product f(t + r)g(t). In practical 
fire control, however, it can be assumed that 
the correlation between target course and 
tracking errors is small enough to be neglected. 

As a simple example of the calculation of 
an autocorrelation we may assume that f(t) = 
sin wt. Then 

1 C T 

tf>i (t) = lim ;r=, I sin u(t + t) sin wt • dt 

= lim 2? / ~ [cos wt — cos (2wt + wr)]d 

- \ cos «*, (5) 

since the term cos (2a>t + an-) will contribute 
nothing in the limit. 

The maximum value of (r) in (5) is found 
at t = 0. This is to be expected, since ob- 
viously the correlation between identical val- 
ues of the function is the best possible. What 
is exceptional about the present result is the 
fact that <£,(t) is not small for all large t's. 
This is fundamentally a consequence of the 
fact that we chose an analytic expression for 
fit), so that the relation between two values 
of the function is completely determinate, no 
matter how great the difference between their 
arguments. In a more representative time 
series, involving a certain amount of statisti- 
cal uncertainty, we would expect £,(r) to ap- 
proach zero as t increases, reflecting the in- 
creasing importance of statistical dispersion as 
the time interval becomes greater. 

The significance of the autocorrelation func- 
tion for data smoothing and prediction is ob- 
vious without much study. Thus, suppose for 

simplicity that the observational error #(0 
is zero. Then the autocorrelation <f>, (t) is the 
only one involved. It is a measure of the ex- 
tent to which the true target path "hangs to- 
gether" and is thus predictable. For example, 
in weather forecasting it is a well-known prin- 
ciple that in the absence of any other infor- 
mation it is a reasonably good bet that tomor- 
row's weather will be like today's but that the 
reliability of such a prediction diminishes rap- 
idly if we attempt to go beyond two or three 
days. This would correspond to an autocorrela- 
tion function which is fairly large in the neigh- 
borhood of t = 0, but diminishes rapidly to zero 

In a similar way the autocorrelation of the 
observational error git) represents the extent 
to which this error hangs together. In this 
case, however, a high correlation is exactly 
what we do not want. Thus, if <£ 2 (t) vanishes 
rapidly as r increases from zero, closely neigh- 
boring values of g are quite uncorrelated, and 
we need only average the input data over a 
short interval in the immediate past in order 
to have most of the observational errors aver- 
aged out. If 4> 2 ir) is substantial for a much 
longer range, on the other hand, a much longer 
averaging period is necessary, with corre- 
spondingly greater uncertainties in the value 
obtained for fit). 


The autocorrelation function does not in it- 
self suffice, to determine a time series com- 
pletely. For example, it is easily seen that the 

functions sin t + sin 2t and sin t + cos 2t have 
the same autocorrelation in spite of the fact 
that they represent waves of quite different 
shape. The autocorrelation function, however, 
has a peculiar importance in the fact that 
under many circumstances it is the only piece 
of information about the time series which we 
need to know. 

The significance of the autocorrelation be- 
comes apparent as soon as we investigate the 
error in prediction. In many mathematical sit- 
uations involving linear systems it is conven- 
ient to deal with the square of the error rather 
than with the error itself, since a first varia- 
tion in the error squared expression gives a 



linear relationship in the quantities of direct 
interest. We will deal with the square of the 
error here. If E represents the instantaneous 
error, /* (t + t,) - fit + t,) , the mean square 
error over a long period of time is evidently 


L f* 

= iim — r 

\r(t + t,) -f( t + t,)}*dt 

[f(t + t f )]*dt 

- lim ^ f f( t + t,)f*(t + t/)dt 

T -»» TJ_ T 

+ lim JL I'* ir(t + t,)\ 2 dt. (6) 

The first integral in equation (6) can be 
evaluated immediately. From (2) it is <M0). 
To evaluate the second integral replace f*(t 
+ t f ) by its definition from (1). This gives 

-lim lf T f{t + t,)dt ["[fit - r) 

+ g(t - T )]dK(r) = - lim ]- f dK{r) 

( T lf(t + t / )f(t-r)+f{t + t / )g(t-r)}dt 

if we reverse the order of integration. Since 
we assume that / and g are uncorrelated, how- 
ever, the product f (t + t f )g\t - r) in this ex- 
pression makes no contribution to the final re- 
sult, and by replacing the integral of f(t + t,) 
f(t — t) by its value in terms of 4> l the expres- 
sion as a whole can be written as 

-if <t>i(tf +t) dK( T ). 

The third integral in (6) can be simplified in 
similar fashion. The final result becomes 

& - 4>i (P) - 2 

f *i 

(tf + r) dK(r) 


+J\k{t) £ [0i(r - c) + Mr ~ <r))dK(c) . 

The only quantities appearing in equation 
(7) are the autocorrelations, <£, and 4> 2 , of the 
true target path and the observational error, 
and the function K which specifies the data- 
unoothing structure. The theoretical problem 

with which we are confronted is evidently that 
of choosing K to make the mean square error 
as small as possible for any given $'s. This 
problem will not be attacked here, although a 
solution obtained by a somewhat indirect 
method is presented in the next chapter. The 
principal reason for deriving equation (7) is 
to demonstrate the very important fact that 
the mean square error depends only upon the 
two autocorrelations. No other characteristics 
of the input data need be considered. 

It will be recalled that the mean square cri- 
terion was introduced originally on the ground 
of mathematical convenience. This leaves un- 
settled the question of how good a measure of 
performance for a data-smoothi; g network it 
actually is. This is a critical question, since 
upon it depends the validity of the whole ap- 
proach outlined in this chapter. A priori, the 
least squares criterion is a dubious one since 
it gives principal weight to large errors. In 
fire control we are normally interested only in 
shots which are close enough to register as hits. 
If a shot misses it makes little difference 
whether the miss is large or small. The merits 
of the least squares criterion are considered 
in more detail in Chapter 9, where the conclu- 
sion is reached that the criterion is probably 
adequate for many problems but needs to be 
supplemented or replaced in others, including 
the special case of heavy antiaircraft fire to 
which particular attention is given in this 
monograph. Pending the discussion in Chapter 
9, the least squares criterion will be assumed 
to be a valid one, with the understanding that 
the analysis is intended primarily for its value 
in contributing to the general understanding of 
the data-smoothing problem rather than as a 
means of fixing the exact proportions of an op- 
timal smoothing network. 


The time-series approach to data smoothing 
is closely associated with another which at first 
sight may seem quite different. This second 
approach is suggested by the procedures used 
in communication engineering. Here the sig- 
nals, be they voice, music, television, or what 
not, are again time series. Instead of dealing 



with actual signals varying in a more or less 
irregular and random manner with time, how- 
ever, it is customary to deal with their equiva- 
lent steady-state components on the frequency 
spectrum. 6 

The analysis of data smoothing can conven- 
iently be approached by supposing that both 
the true path of the target and the effects of 
tracking errors are represented, in a similar 
way, by their frequency spectra. When the 
situation is presented in this way, however, 
there is an obvious analogy between the prob- 
lem of smoothing the data to eliminate or re- 
duce the effect of tracking errors and the prob- 
lem of separating a signal from interfering 
noise in communication systems. We may take 
as an example of the latter the transmission 
of voice or music by ordinary radio over fairly 
long distances, so that the effects of static in- 
terference are appreciable. In such a system 
a reasonable separation of the desired signal 
from the static can be obtained by means of 
a filter. In a representative situation an ap- 
propriate filter might transmit frequencies up 
to perhaps 2,000 or 3,000 cycles per second,' 
while rejecting higher frequencies. 

The choice of any specific cutoff, such as 
2,000 or 3,000 c, in the radio system depends 
upon a compromise between conflicting consid- 
erations. Both speech or music and static nor- 
mally include components of all frequencies 
which can be heard by the human ear. Thus, 
suppressing any frequency range below the 
limits of audibility, at perhaps 10,000 or 20,000 
c, will injure the signal to some extent. The 
intensity of the signal components, however, 
diminishes rapidly above 2,000 or 3,000 c, while 
the energy of the static interference is more 
evenly distributed over the spectrum. Thus, by 
filtering out the first 2,000 or 3,000 c, we can 
retain most of the signal while rejecting most 
of the noise. Naturally, the exact dividing line 
will depend upon the relative levels of signal 
and noise power. If the static interference is 
quite weak, for example, it would be worth 

b The review of communication theory given in Ap- 
pendix A shows how this equivalence is established by 
Fourier or Laplace transform methods. 

In practice, of course, the filtering would probably 
take place in the radio-frequency circuits, but it is 
more convenient here to think of it occurring in the 
demodulated output. 

while to transmit a considerably wider band 
in order to retain a more nearly perfect signal. 
If the static level is extremely high, on the 
other hand, it would be necessary to transmit a 
still narrower band at the cost of greater mu- 
tilation of the signal. 

The separation of the true path of a target 
from the observed path including tracking 
errors, as a preliminary to prediction of the 
future position of the target, presents an ap- 
proximately analogous situation. Again the 
spectrum of the "signal" or true path is con- 
centrated principally in a low-frequency band, 
in most instances, while the energy of tracking 
errors or "noise" appears principally at con- 
siderably higher frequencies. Thus the two can 
be separated by a low-pass filter. The separa- 
tion, however, is not complete since some com- 
ponents of the signal spectrum extend into the 
noise region. Thus the smoothing process must 
be accompanied by some mutilation of the sig- 
nal, and the optimum compromise is again 
attained from a filter which transmits a rela- 
tively broad band when the tracking errors are 
of low intensity and a much narrower band 
when they are large. 

In these terms the most obvious difference 
between the data-smoothing problem and the 
static interference problem in the radio system 
is in the order of magnitude of the frequencies 
involved. They are roughly 10,000 times smaller 
in the data-smoothing case. Thus, the typical 
signal band in a fire-control system may cover 
a few tenths of a cycle per second, in compari- 
son with a useful band of 2,000 or 3,000 c in a 
radio system, and the spectrum of tracking 
errors or noise, with representative tracking 
devices, includes appreciable components up to 
perhaps 2 or 3 c, in comparison with a total 
effective noise band in the radio system ex- 
tending to the limits of audibility at perhaps 
20,000 c. 

This analogy between data smoothing and 
the filtering problems which appear in ordi- 
nary communication systems transmitting 
speech or music must of course not be carried 
too far. For example, previous experience with 
communication filters is of no help in fixing in 
detail the cutoff in attenuation characteristic 
of the data-smoothing filter, since in communi- 
cation systems these choices depend on psycho- 




logical considerations of no relevance in the fire- 
control problem. Methods of determining the 
best rules for proportioning a data-smoothing 
filter, therefore, remain to be determined. We 
may also notice that, whereas the time-series 
approach was of the data-smoothing and pre- 
diction type, the filter approach emphasizes 
data smoothing only. The addition of the pre- 
diction function can be expected to change ma- 
terially the overall characteristics of the cir- 
cuit. Neither of these remarks, however, robs 
the filter approach of its value as a simple way 
of thinking about the problem qualitatively. 



The time-series and filter methods of looking 
at data smoothing are related to one another 
by the fact that the autocorrelation can be com- 
puted from the amplitude spectrum, or vice 
versa, by Fourier transform means. Consider, 
for example, the Fourier transform of the 
autocorrelation. If we make use in particular 
of (4) we have 

0..v (r)e ~* 

V2irJ_ a 
i- f 



f(t + r)f(l)dt 


V2t w t X 



f{t)dt / f(l +t) e-^-dr 

•J — CD 

/(/ + T)e-*"»+*J rfr 



1 f m 

*'(«) = me-»*dt 


L. f 

'2r X. 

f(t + t) e - •«('+') dr 


F(w) is of course the steady-state spectrum 
of the signal f(t). Equation (8) thus states 
that the Fourier transform of <f>. s - is equal to a 
constant times the square of the amplitude of 
the steady-state spectrum. The amplitude 
squared spectrum is, however, a measure of 

the power per cycle. The relation is therefore 
equivalent to the statement that the autocorre- 
lation and power spectrum are Fourier trans- 
forms of each other. 

Since we have already established the fact 
that the mean square error in prediction de- 
pends only on the autocorrelation, this analysis 
enables us to conclude immediately that the 
mean square error can also be calculated from 
the power spectra of the signal and noise. It 
is entirely independent of the phase relations 
in either signal or noise. The phase character- 
istics of the data-smoothing network, which 
operates on the signal after a specific wave 
shape has been established, is, of course, still 
of consequence. 


Thus far the material which has been pre- 
sented has been primarily mathematical. It 
has consisted, in other words, of outlines of 
general analytical methods which are available 
for use with the data-smoothing problem. It is 
also possible to approach the problem in a 
much more concrete fashion. It is obvious that 
by giving thought to the details of the physical 
characteristics of tracking units and targets, 
and to the tactical situations with which we 
expect to deal, it should be possible to draw a 
number of specific conclusions about the prob- 
lem as a whole. In a general theory of the de- 
sign and tactical use of fire-control apparatus 
such an approach might well be a primary one. 
It is scarcely possible to follow it in detail in 
the present discussion. The following para- 
graphs, however, indicate some of the kinds of 
considerations which can be brought into the 
problem in this way. It will be seen that they 
tend to modify the strictly mathematical ap- 
proach, partly by qualifying to some extent the 
assumptions made in the mathematics, and 
partly by tending to give much more emphasis 
to particular aspects of the problem than would 
appear in a general analytic outline. 

Choice of ouukuiinatbb 

One of the most obvious omissions in the 
general analysis thus far is any consideration 
of the choice of coordinates in which the data 




smoothing is to take place. So far as either 
the statistical or filter theory is concerned, the 
coordinates in the data smoother may repre- 
sent either the original tracking data or any 
transformation of them. The fact that there is 
actually something to be decided here, however, 
is easily seen from the long-range antiaircraft 
problem. The input tracking coordinates for 
antiaircraft would normally be azimuth, eleva- 
tion, and slant range. If the airplane flies in a 
straight line roughly overhead, the general 
shape of the azimuth and the azimuth rate as 
functions of time are given by the curves in 
Figure 2. The curves become indefinitely 













Figure 2. Azimuth and azimuth rate for crossing 

steeper as the target path approaches the 
zenith, and it will be seen that if the approach 
is reasonably close, either the azimuth or the 
azimuth rate must include a very substantial 
amount of high-frequency energy. Since the 
possibility of an effective separation between 
the signal and noise in the filter approach de- 
pends upon the assumption that the signal com- 
ponents are of quite low frequency with respect 
to the noise, the presence of this high-frequency 
energy is evidently serious. 

When the target describes a violently evasive 
path the signal spectrum must naturally in- 
clude substantial high-frequency components, 
whatever the coordinate system may be. The 
high-frequency components indicated in Figure 
2, however, are due to the fact that the target 
path happens to pass almost over the director 
and are essentially superimposed upon the 
high-frequency components which reflect the 
complexity of the target path itself. It is clear 

as a matter of principle that an acceptable 
coordinate system for data smoothing should 
not introduce frequency components which de- 
pend upon such accidental factors as the loca- 
tion and orientation of the coordinate system. 
The rectangular system mentioned in connec- 
tion with Figure 1 evidently meets this condi- 
tion; so also does the "intrinsic" system de- 
scribed in the next section. 

Physical Limitations of Target or Tracker 

We may also approach the data-smoothing 
question by a consideration of the motions 
which are physically possible either in the 
target or in the tracking device. In the heavy 
antiaircraft problem, for example, there are 
substantial physical limitations on the per- 
formance possibilities of present-day aircraft 
We can be quite sure that any motion incom- 
patible with these limitations is necessarily a 
tracking error and can be removed from the 
incoming data. Naturally, these limitations 
must appear in the power spectrum of the sig- 
nal if they affect the mean square error in pre- 
diction, so that their existence in no way dis- 
putes the mathematical framework we have 
set up. Consideration of the physical factors 
which produce them, however, may permit 
them to be established more easily or in more 
clear-cut fashion than would be possible from 
a statistical examination of target records 

The limitations on airplane performance 
can be stated most simply when the motion of 
the airplane is expressed in so-called intrinsic 
coordinates. These are the speed of the air- 
plane, its heading, and its angle of dive or 
climb. The maneuvering possibilities of a con- 
ventional airplane in these three directions are 
quite unequal. By banking sharply it can 
maneuver violently to the right and left and 
thus make quick changes in heading. The pos- 
sibilities of maneuvering up and down, how- 
ever, are considerably less, particularly for a 
heavy airplane, where there are usually restric- 
tions on the maximum angle of dive or climb 
which can be assumed. The possibilities of 
quickly changing the speed of the airplane, 
finally, are almost nil. The thrust of an air- 
plane propeller is so small in comparison with 



the mass of the airplane that only small accel- 
erations are possible.* 1 

Thus the optimum filters for the three coor- 
dinates should be different. The one for speed 
can have a very narrow band, since most of 
the signal energy for this coordinate occurs at 
very low frequencies. The optimum band for 
the angle of dive or climb, however, should be 
larger (unless it turns out that pilots seldom 
make use of maneuvering possibilities in this 
direction) and the one for the heading larger 
still. In this ability to discriminate among the 
various possible directions of motion the in- 
trinsic coordinate system is evidently an im- 
provement even on the rectangular system. 

Settling Time 

Another aspect of the data-smoothing prob- 
lem which has not been given conspicuous at- 
tention in the purely mathematical discussion 
is the fact that in an actual tactical situation 
questions of elapsed time are of great impor- 
tance^ Engagements usually begin suddenly 
and last for a comparatively brief period, and 
it is important to find a data-smoothing scheme 
which provides adequate firing data as quickly 
as possible after an engagement starts. A situ- 
ation essentially similar to the beginning of an 
engagement may also be presented whenever 
the target makes a sudden change of course or 
whenever it is necessary to shift from one 
target to another in a given attacking body. 
The time required for a computer to give 
usable output data after any of these events is 
its so-called "settling time," and is one of the 
most important parameters of any data- 
smoothing system. It is possible to make rough 
estimates of settling time by indirect means in 
both the statistical and filter theories of data 
smoothing, but no explicit consideration of 
necessary time lapses appears in either theory. 
Evidently, the fundamental fault lies with the 
"stationary" assumption. 

* This ignores the possibility of changing the speed 
through gravitational forces. Since these possibilities 
are linked to the angle of dive or climb, however, they 
can be predicted. This has actually been done in one 
experimental computer. 

Effect of Human Factors 

Aside from the conditions on target perform- 
ance which arise from the physical character- 
istics of the target itself, there are others 
which are due to the fact that the target is 
under the control of a human being with a 
definite purpose. The language of the statistical 
and filter methods is broad enough to cover 
almost any situation. It tends to suggest, how- 
ever, that the typical target paths with which 
we deal are the relatively structureless conse- 
quences of random physical forces. The inter- 
vention of purposive human behavior, on the 
other hand, tends to give paths which fall into 
more or less definite patterns. A simple illus- 
tration is furnished by the argument which is 
frequently offered in defense of the straight 
line assumption in dealing with antiaircraft 
defense against heavy bombers. It is contended 
that while the targets may in fact engage in 
substantial evasive maneuvers during most of 
their flight, there will always be a substantial 
period during the bombing run in which they 
must fly very straight in order to achieve 
bombing accuracy. On the basis of ordinary 
probability we would of course expect substan- 
tial straight line segments quite infrequently 
if the course as a whole shows marked disper- 
sion, and the intervention of the human pilot 
thus provides a higher degree of structure than 
one would expect in a corresponding situation 
dominated by purely natural factors. 

A broader example is furnished by a com- 
parison of two airplanes, or perhaps more 
simply of two boats, one of which is under the 
control of a human operator, while in the other 
the steering controls are lashed in a neutral 
position. Both boats, say, may be expected to 
experience small variations of course due to the 
random effects of wind and waves upon them. 
Over a short period of time the observed mo- 
tions of the two boats should be substantially 
identical. In the case of the boat with the 
lashed helm these random variations will tend 
to accumulate, so that it is possible to make a 
reasonable prediction of the position of the 
boat for only a comparatively short distance 
in the future. In the boat with the human 
steersman, on the other hand, we may expect 
corrections to be applied as soon as the random 
effects become large, so that the boat tends to 




retain the same general course and it is pos- 
sible to predict its position hours or even days 
later from a relatively brief observation. 

Neither of these illustrations is inconsistent 
with the mathematical framework laid down 

phase relations, even if the special features in 
these situations may be the controlling factors 
in determining the actual probability of hit- 
ting. If we could believe the bombing run 
hypothesis, for example, and had a sufficiently 

earlier in the chapter, in a purely theoretical accurate computer and gun, we could expect 

sense. For example, the bombing run illustra- 
tion merely states that because of the presence 
of the human operator there are definite phase 
relations in the input signal. As we have seen, 
such relations can exist without affecting com- 
putations based on mean square error. The 

to score a hit in every engagement, no matter 
how large the mean square error might be. 
More generally, it is probably only the ten- 
dency of targets to exhibit "line spectra" which 
prevents the real probability of a kill, small 
at best, from becoming microscopic. It is nec- 

comparison between the piloted and pilotless essary to lay special emphasis on these factors 

boats can be interpreted as the result primarily 
of differences in the signal power spectrum. 
In the case of the pilotless boat, for example, 
the signal occupies a fairly continuous low- 
frequency band, while in the case of the piloted 
boat it must be regarded as concentrated very 
closely around zero frequency, so that it is ap- 
proximately a line spectrum superimposed on 
a continuous one. The formal mathematical 
theory covers also such cases as these. 

The point of this discussion, however, is that 
the mathematical theory, although it is suf- 
ficiently general in a formal sense, fails to dif- 
ferentiate between such situations as those 
just described and the more shapeless sort which the mean square error is not a good 
involving continuous spectra with random guide to the actual probability of scoring a hit. 

in order to keep the overall fire control picture 
in perspective. 


Last on this list of doubts about the statisti- 
cal and filter theories, we may mention the 
least squares criterion of accuracy. This was 
discussed before, but it is mentioned again as 
a matter of emphasis, and because of its close 
relation with the factors we have just dis- 
cussed. For example, the bombing run illustra- 
tion obviously represents one situation in 


Chapter 8 


Tt was shown in the previous chapter that 
J- both the statistical and filter theory ways of 
looking at the data-smoothing problem lead 
naturally to an analysis in terms of the power 
spectra of the signal and noise. The phase rela- 
tions are not important as long as we accept 
the mean square error as a criterion of per- 
formance. The inadequacies of the mean square 
criterion will finally force us to abandon the 
steady-state attack in favor of a direct analysis 
in terms of the wave shapes of some assumed 
signals. The steady-state attack is nevertheless 
a very useful one. This chapter will conse- 
quently continue the analysis from this point 
of view. It will be assumed as heretofore that 
the heavy antiaircraft problem is the particular 
subject of interest. 

A large part of the discussion hinges upon 
the conditions which must be satisfied by the 
external characteristics of an electrical net- 
work if it is to be capable of physical realiza- 
tion in any way whatever. These limitations 
and the characteristics which may be postulated 
for physical networks are decisive since, in the 
absence of such restrictions, no limits could be 
set upon the performance which might be ex- 
pected from data-smoothing and predicting 
circuits. The facts about physically realizable 
networks which we shall find of most use are 
summarized below, but the reader not familiar 
with this field is urged to read also the account 
given in Sections A.9 and A.10, Appendix A.»* 
The conditions which must be satisfied by 
physically realizable networks can be stated in 
either transient or steady-state terms. In tran- 
sient terms they are expressed most simply by 
the statement that the response of a physical 
network to an impulsive force must be zero up 
to the time the force is applied. Thus the net- 
work has no power to predict a purely arbi- 
trary event. That is, it has no way of foresee- 
ing whether or not an impulse is actually going 
to be applied to it. This characteristic of physi- 
cal networks is taken as a postulate. 

The steady-state limitations on physical net- 


works are expressed in terms of their attenua- 
tion and phase characteristics. They may be 
derived either from the transient specification 
or from the postulate that a physical network 
must be stable. There are no important limita- 
tions to be placed upon the attenuation and 
phase characteristics of physical networks as 
long as we deal with these characteristics "sepa- 
rately, but there are very severe limitations on 
the phase characteristic which can be associated 
with any given attenuation characteristic or 
vice versa. In particular, when the attenuation 
characteristic is prescribed, there is a definite 
formula for calculating the unique limiting 
phase characteristic with which it may be asso- 
ciated. 1 " This is the so-called "minimum phase" 
characteristic because any other physical net- 
work having the postulated attenuation char- 
acteristic must have as great or greater phase 
shift at every frequency. As we shall see later, 
this greater phase characteristic would corre- 
spond to longer lags in obtaining usable data, 
so that the minimum phase characteristic is 
the optimum for a data-smoothing network. 
The minimum phase characteristic has the addi- 
tional important property that not only does 
it specify the transfer admittance of a physical 
network, but the reciprocal of that transfer 
admittance can also be realized by a physical 

In addition to this principal formula for the 
relation between attenuation and phase there 
are a number of subsidiary expressions for 
special aspects of the problem. One in partic- 
ular, relating the attenuation to the behavior 
of the phase characteristic in the neighborhood 
of zero frequency, is used extensively in this 

» In limiting cases, such as may be found when the 
transfer admittance contains zeros or poles exactly on 
the real frequency axis, the "physical structure" may 
require such constituents as ideally nondissipative re- 
actances, perfect amplifiers with unlimited gain, etc. 
This, however, is of no consequence for the present 
general discussion. 







It is natural to begin with a discussion of the 
spectrum of a typical target path. Unfortu- 
nately no data on the spectra of actual meas- 
ured airplane paths exist, and the theoretical 
assumptions which may be made about paths 
of airplane targets are best discussed in the 
next chapter. This section consequently will be 
confined to rather general observations about 
the problem. It will be convenient to assume 
for definiteness that the quantities to be 
smoothed are the velocity components in Car- 
tesian coordinates. 

The simplest point of departure is furnished 
by the conventional assumption that the target 
flies in a straight line at constant speed. If we 
could construe this assumption literally, it 
would mean that the velocity spectrum in rec- 
tangular coordinates would reduce to a single 
line at zero frequency. In practice, of course, 
the spectrum is not so simple. Even in the 
absence of deliberate maneuvering, the target 
will fly a slightly curved path because of 
"wander." Moreover, even if the target could 
fly exactly straight, the single line spectrum 
would apply only to a straight course in- 
definitely continued. The spectrum becomes 
more complicated if we consider the fact that 
tracking must have begun at some finite time 
in the past, or that the target may presumably 
change occasionally from one straight line 
course to another. 

As a result of both these causes, the actual 
signal spectrum must be regarded as occupying 
a band bordering on zero frequency. The distri- 
bution of energy in detail will, of course, 
depend on particular circumstances. The band 
has no very well defined upper limit, but in 
most cases the great bulk, at least, of the 
energy should be below, say, one-fourth or one- 
fifth of a cycle per second. For example, the 
natural periods of a heavy airplane, which one 
would expect to be correlated with wander, are 
below this limit." This limit is also sufficient to 
include most of the energy resulting from 
changes in course occurring as frequently as 
every ten or twenty seconds. 

In general, it is to be supposed that the sig- 
nal spectrum varies as where n may be 
1, 2, 3, depending on the frequency range. This 
follows from general considerations of the 

limitations of airplane performance. Thus, if 
we suppose that the velocity changes discon- 
tinuous^ from time to time, it follows from 
general Fourier principles that the amplitude 
must vary as This is presumably a fair 
representation of the actual signal spectrum at 
low frequencies. At moderate frequencies, how- 
ever, we must take account of the fact that the 
velocity can actually be changed rapidly but 
not discontinuously, and we consequently 
assume that the amplitude begins to vary as 
ur a . Finally, at frequencies of the order of per- 
haps one cycle per second one must take ac- 
count of the fact that the airplane must bank 
in order to turn. Since it takes some time to roll 
into the bank, even the acceleration in the lat- 
eral direction cannot be discontinuous, and 
consequently the amplitude must begin to vary 
as c.r\ The application of such successive limit- 
ing factors in constructing a complete spec- 
trum is described in more detail in Section A.8 
of Appendix A. 

One other general condition of the same kind 
can be mentioned. It can be shown" that the 
integral from zero to infinity of log H/l + if", 
where H is the power spectrum, is very impor- 
tant in determining the properties of a time 
series. More explicitly, the integral converges 
if the series is essentially statistical, so that we 
cannot foretell the future from the past with 
absolute certainty. This of course is the case 
with an actual signal spectrum in a fire-control 
problem. It implies two consequences; first, 
that H cannot be zero over any finite band ; and 
second, that in the neighborhood of infinite fre- 
quency H diminishes slowly enough so that 
| log H\/o>->0. 


The spectrum of tracking errors depends 
largely upon the particular sort of tracking 
equipment involved. Broadly speaking, optical 
tracking equipment (at least that of the present 
or recent past) tends to produce tracking errors 
not only of small amplitude, but also of low 
frequency, so that they are hard to separate 
from the signal spectrum. Radar equipment, of 
the present time, produces higher-frequency 
errors. Relatively high-frequency errors are 
particularly likely to be found in very stiff 
automatic tracking radars. 




A number of examples of spectra of tracking 
errors are shown in Figures 1, 2, and 3. The 
spectra are given directly in terms of range 
and angle errors. To make them comparable 
with the velocity spectra described previously 

RMS =30 YDS 
MEDIAN = 0.022CPS 




E 4.10*- 

t 4 6 « 10 

Figure 1. 

, 12 14 IS 

Power spectrum of range errors of ex- 

it would be necessary to multiply all amplitudes 
by io. In addition, it would of course also be 
necessary to multiply the angle rates by some 
suitable range in order to compare them di- 
rectly with the yards-per-second rates we have 
otherwise considered. 

After multiplication by <■>, the radar spectra 
appear to be about flat up to perhaps one cycle. 
Beyond that point they no doubt drop off 
slowly, although the accuracy of the data is not 
sufficient to permit the situation to be stated 
very exactly. 



The properties of the signal and noise as we 
assume them here can be conveniently 
expressed by reference to the theory of so-called 

"random noise" functions. h A random noise can 
be defined as a function which has a definite 
amplitude spectrum but completely random 
phase characteristics. The theory of such func- 
tions is well developed because of their frequent 

RMS= 1.0 MIL 

t 10 

A 6 8 10 12 

Figure 2. Power spectrum 
errors of experimental radar. 

of angular height 

occurrence in physics. It is probable that 
neither our noise functions nor our signal func- 
tions are, strictly speaking, random noise ac- 
cording to this definition. Thus, there are proba- 
bly certain definite phase relations in our noise 
functions because of the physical character- 
istics of tracking devices. There is no evidence, 
however, that any such relations are important 
enough to be significant in the data-smoothing 
problem, so that we are fully justified in iden- 
tifying them with random noise functions as 
defined above. The phase relations in the signal 
are by no means random. As long as we con- 
sider only the mean square error, however, this 
factor is immaterial, and we can replace the 
actual signal by a random noise function with 
the same power spectrum for purposes of 

The most familiar example of a random 
noise function is furnished by the thermal 

"The fact that we also refer to tracking errors as 

"noise" is, of course, merely a coincidence. 



voltage across a resistance R. This is a random 
noise whose spectrum is constant up to very 
high frequencies with the value P == 4\kTR (k 
is Boltzmann's constant and T the absolute 
temperature) . A second example is black body 


RMS = 1.4 MIL 

CO 10 






— J 


■» - 



/ ^ 

2 4 6 1 

1 10 12 14 16 



Power spectrum of trav 

radiation. If there is black body radiation in a 
space, the electric (or magnetic) field intensity 
at a point is a random noise function with 


P(D = 

8*/ 3 1 

according to Planck's law. Random noise func- 
tions also occur in the Schottky effect, in 
Brownian motion, and in diffusion and heat 
flow problems. 

For purposes of analysis, a random noise 
function can be thought of as a function made 
up of a large number of sinusoidal components, 
which are very closely spaced in frequency 
and whose phases are completely ran- 
dom. 21 231 Thus a random noise can be repre- 
sented as 


2] a- cos {u n t + <(>n) 

n - 1 

where w n — n&f, A/ being the frequency differ- 
ence between adjacent components. The phase 

angles <f>„ are random variables which are in- 
dependent with a uniform probability distribu- 
tion from to 2tt. As A/ decreases the functions 
in this ensemble approach, in a certain sense, 
a limiting ensemble, providing the amplitudes 
a„ are adjusted properly. What is desired is to 
have the total power in the neighborhood of 
each frequency approach a certain limit P(/), 
the power spectrum at that frequency. To do 
this we make 

a.i = 2tP(/)A/. 

In the limiting ensemble the total power within 
a small frequency range A/ is then P(/)A/. 
The function PU) completely describes the 
random noise ensemble from the statistical 
point of view. 

A particularly important special case is that 
of a random noise with a constant power spec- 
trum. This is often called "flat" or "white" 
noise. True constancy out to infinite frequencies 
is of course impossible since it would imply an 
infinite total power in the function. The idea 
is, however, still useful and can be approxi- 
mated, as with resistance noise, by having a 
spectrum which is constant out to such high 
frequencies that behavior beyond this point is 
of no importance to the problem. We may con- 
veniently think of flat random noise as being 
made up of a succession of weak impulses oc- 
curring frequently but at random times with 
respect to one another. This results from the 
fact that a Fourier analysis of a single impulse 
gives a flat spectrum, and the random occur- 
rence of many of them produces a random set 
of phases. In a physical problem, such as resis- 
tance noise or Brownian motion, these im- 
pulses might correspond to the effects of indi- 
vidual small particles. Such a situation is of 
course completely chaotic. If the impulses are 
large and occur relatively infrequently, the 
power spectrum is still flat, though the func- 
tion is no longer a random noise function as 
defined here. This conception, which corre- 
sponds to a physical situation including definite 
causative elements, will be revived later under 
the name of the elementary pulse method of 

Random noise functions have a number of 
interesting characteristics. For example, they 
have the "ergodic property." This means that 



averaging a statistic along the length of a par- 
ticular random function give' the same results 
as averaging the same statistic over an 
ensemble of functions having the t ime power 
spectrum. Each function is typical of the 
ensemble. To be more precise one must admit 
exceptions, but the probability of an exception 
is zero. For example, if we determine the frac- 
tion of time a given random function f(t) has 
a value greater than some constant .4, it will 
be equal to the fraction of all functions in the 
ensemble which are greater than A at t — 
(with probability 1 ) . 

A second characteristic of random noise 
functions is the fact that they frequently lead 
to Gaussian or normal law distributions. For 
example, the aronlit'-Hes of a random noise 
function are di^tri^ <:._d about zero in accord- 
ance with the nvr^ttal error law. Likewise, the 
amplitudes for two points spaced a given dis- 
tance apart form a two-dimensional normal 
error law distribution when we consider all 
possible positions of the first point. It is ap- 
parent that if the signal and noise are actually 
random functions the mean square error is as 
good a criterion of performance as any other, 
since it completely fixes the distribution in a 
normal law case. 

A final property of random noise functions 
is the fact that if a random noise is passed 
through a filter the output is still a random 
noise. If the power spectrum of the noise is 
P(w) and the transfer characteristic of the 
filter is Y(iw), the output spectrum is 
P(a>)\Y(iw) \\ In particular, if we take the 
derivative of a random noise with spectrum 
P(w) we obtain one with spectrum w 2 P(w). 

This last property of random noise functions 
suggests a method of representing them which 
we shall find useful in the future. The method 
is represented by Figure 4. It consists of a 





Figure 4. Circuit representation of random 

source of flat noise followed by a shaping filter 
to give the desired power spectrum. We can 
easily assign to the filter the characteristics of 
a physically realizable structure by making use 

of the relations between attenuation and phase 
mentioned earlier in the chapter. It is merely 
necessary to convert the desired power spec- 
trum into a specification of the attenuation 
characteristic of the filter and then use the 
loss-phase formula to compute the correspond- 
ing phase shift. It will be assumed that this 
procedure has been followed when we make use 
of this circuit at a later point. 

The method of representing random func- 
tions thown by Figure 4 illustrates graphically 
the basis of the prediction schemes described 
thus far. The flat noise is of course absolutely 
unpredictable. The history of the function up 
to any given instant gives no indication of its 
value even a microsecond later. The filter, how- 
ever, forces the output current to have a cer- 
tain structure on which a prediction may be 
based. For example, if the filter will pass only 
very low frequencies it is clear that the output 
can change very little in a microsecond. 


The signal and noise spectra furnish the raw 
material from which a suitable data-smoothing 
filter can be deduced. We have still to deter- 
mine, however, the exact rule for choosing the 
cutoff and attenuation characteristic of the 
filter from these spectra. It is clear that previ- 
ous experience with signal-to-noise problems 
in systems transmitting voice- or music is no 
help, since the filter proportions here depend 
upon psychological considerations of no rele- 
vance to the fire-control problem. For example, 
the interfering effect of a small amount of 
noise is much greater than one might expect 
from energy considerations, especially in in- 
tervals of low message level, and it is con- 
sequently worth while to maintain a relatively 
high level of attenuation in the noise band. 
Conversely, the breadth of the band required 
for the message depends as much on the ability 
of the ear to reconstruct a complete signal 
from an incomplete one as it does upon the 
actual signal power spectrum. 

In the data-smoothing case a suitable crite- 
rion, dependent upon more physical considera- 
tions, can be obtained by minimizing the rms 
error at the filter output. This criterion is 




easily developed from the power spectrum ap- 
proach, and in a sense it is, of course, the only 
possible one as long as we follow the methods 
developed thus far. 

A very general theory for the minimization 
of the rms error of the filter output has been 
developed by Wiener. 1 Since the power spec- 
trum approach is not the one we shall eventu- 
ally follow, however, it is not necessary to give 
this analysis in detail. The nature of the rela- 
tionships can be seen from an elementary corn- 
in Figure 5 let OA be a unit 

square error is a minimum if 


Figure 5. Vector relation between input and out- 
put of data-smoothing network. 

vector representing the signal component at 
some particular frequency. Let the amplitude 
ratio between the input and output of the data- 
smoothing filter be x, and let it be assumed that 
the system is phase distortionless. This can 
always be accomplished, at the cost of lag, by 
phase equalization. Then the actual signal 
output can be . represented by OB, where 
OB/OA = x. Let the ratio of noise power to 
signal power at this frequency be k 2 . Then the 
output noise can be represented by the vector 
BC, at some arbitrary phase angle 6, where 
BC/OA = kx. 

The error in the output of the data-smooth- 
ing filter is evidently represented by the vector 
AC. We have 

(Acy = (CM) ? i(i - x - kxcosey + (kxsmey] 

= {OA)* l (1 - i s ) - 2*i(l - x) cos 6 + k'x') . 

Since 6 is random the cross-product term in- 
volving cos 6 disappears on the average. (More 
generally, it disappears as long as the noise and 
signal are uncorrelated, whether or not their 
relative phases are entirely random.) This 
leaves the mean square error as 

Wan - (OA)l [1 _ 2Z + (1 + *»)*»] . (1) 

x — 


1 + A-» P N + P s 

where P B and P s are, respectively, the signal 
and noise power at this frequency. Upon sub- 
stituting this result in equation (1) and "re- 
membering that (OAV = P B , we find that the 
minimum mean square error is 

PsPs (2) 


Ps + Pi 

Equation (2) evidently represents the sought- 
for rule for the filter transmission character- 
istic. It is illustrated in Figure 6, where P N 









w 1 


1 ^ 





i — - 



Figure 6. Optimum transmission characteristic 
for data smoothing assuming signals with random 
noise characteristics. 

Figure 7. Si 
in Figure 6. 

spectra assumed 

and P t have been chosen respectively as the 
flat curve and the 1/w* curve in Figure 7. In 
comparison with the characteristics of typi- 
cal filters in communication systems it is quite 



rounded with a relatively slowly falling ampli- 
tude characteristic. More important than the 
detailed rule for the transmission character- 
istic, however, is the conclusion that the shape 
of the characteristic is not very critical. There 
is very little loss in replacing the actual curve 
in Figure 6, by any other similar character- 
istic. For example, we might validate the 
assumption of zero phase distortion by making 
use of the curve which automatically gives a 
linear phase shift. 150 

A more extreme illustration is furnished by 
the infinitely selective filter characteristic, with 
perfect transmission in the range in which the 
signal power is greater than the noise power, 
and zero transmission elsewhere, indicated by 
the broken lines in Figure 6. 

It follows from equation (1) that in the 
neighborhood of the cutoff point <o the mean 
square error for this filter is twice that of the 
optimum structure. In most frequency ranges, 
however, the penalty is far less than this. Since 
even a two-to-one change in the mean square 
error would produce no tremendous improve- 
ment in the effectiveness of fire, it is clear that 
the result to which we are led by this method 
of attack is by no means critical. 


The analysis just concluded has been directed 
at the amplitude characteristics of a data- 
smoothing filter. By virtue of the relations be- 
tween the amplitude and phase characteristics 
of physical networks mentioned earlier in the 
chapter, however, the analysis permits us to 





IN »• 


u a 






Figure 8. Some filter attenuation characteristics. 

give at least a partial description also of the 
phase characteristics of the filters. This is an 
important consideration because it bears upon 
the question of time delays in data-smoothing 
systems which was mentioned in Chapter 7. 

The general nature of the relationship in 
simple cases is illustrated by Figures 8 and 9. 









— — 

e SHirr in 





£ / 



Figure 9. Corresponding minimum phase char- 

Figure 8 shows a series of rising attenuation 
characteristics equivalent to rather unselective 
falling amplitude characteristics of the general 
type shown by the principal curve in Figure 6. 
Figure 9 shows the corresponding phase char- 
acteristics computed on a minimum phase shift 
basis. In Figure 8 the central attenuation char- 
acteristic B has been so chosen that the corre- 
sponding phase characteristic in Figure 9 is 
exactly a straight line at low frequencies, 
where the transmitted amplitudes are appreci- 
able. Curves A and C in the two drawings show 
slightly different cases, but it is clear from 
the figures that the tendency of the phase 
characteristics to approximate linearity is still 

In communication engineering a phase char- 
acteristic proportional to frequency is inter- 
preted as indicating a delay in seconds equal to 
the slope dB/dw of the phase characteristic. 
This relation is illustrated most simply by an 
ideal line. The ideal line has zero attenuation 
combined with a phase shift which is propor- 
tional to frequency and which at any given fre- 
quency is also proportional to the length of the 
line in question. If we apply any arbitrary 
wave to the line it is propagated down the line 
with a definite velocity and unchanged wave 
form. The time required for the wave to reach 



any point on the line is equal to the slope of the 
phase characteristic to that point. 

In a structure like a filter, which has an at- 
tenuation characteristic varying with fre- 
quency, it is of course no longer possible to 
transmit an arbitrarily impressed wave with- 
out change in wave shape. Even if the applied 
wave is merely a suddenly applied d-c voltage 
or single frequency sinusoid, there is a tran- 
sient period before the response approximates 
its final value. In structures having a substan- 
tially linear phase characteristic over any fre- 
quency range in which they exhibit an appreci- 
able amplitude response, however, this total 
transient characteristic falls naturally into two 
parts. The first is a waiting period equal to the 
slope of the phase characteristic, during which 
the response is very small, whereas the second 
is a true transient period in which the response 
is substantial but does not resemble the final 
steady-state response. This is illustrated by 
Figure 10 which shows the voltage at the fifth 





10 15 20 
<J e t 


Figure 10. Voltage at fifth section of conventional 
low-pass filter in response to unit d-c voltage. 

section of a conventional low-pass filter in 
response to a d-c voltage applied at zero time 
at the input terminals. 1 " The end of the waiting 
period, as deduced from the slope of the phase 
characteristic, is indicated by the broken line. 

Delays of the sort just illustrated must be 
expected in a data-smoothing filter whenever 
the nature of the signal is changed. This hap- 
pens at the beginning of tracking, in changing 
from one target to another, or even in follow- 
ing a single target when the target makes an 
abrupt change in course. Since usable data in 
a fire-control system must be quite accurate, 
the delay to be allowed for must include both 
the initial waiting period and the subsequent 

transient period until the transient ripples 
have almost vanished. A considerable part of 
the art of desi ung data-smoothing networks 
consists in controlling the design so that these 
final transient ripples decay relatively rapidly. 
We are not yet ready to discuss this problem: 
It will turn out, however, that the minimum 
interval which can be assigned to the "true 
transient" period is about equal to that which 
must be allowed for the initial waiting period/ 
Thus the slope of th? phase characteristic can 
be used as an index of the lags which must be 
expected in data smoothing merely by doubling 
the delay to which the slope would normally be 
said to correspond. 

When we use the phase slope as an index of 
delay it becomes immediately apparent that 
lags are the necessary consequence of smooth- 
ing in physical circuits. This is easily seen by- 
reference to the relations which must exist be- 
tween attenuation and phase characteristics in 
physical structures. An example is provided by 
the formula 15 * 1 


where A is attenuation, .4,, is the attenuation 
at zero frequency, and B is phase shift. In other 
words, the delay (measured by the slope of the 
phase characteristic at zero frequency) is pro- 
portional to the integral of the attenuation on 
an inverse frequency scale when the attenua- 
tion at zero frequency is taken a&.the reference. 
The equation thus states that the system will 
exhibit a lagging response as long as there is a 
net high-frequency attenuation. As a numerical 
illustration, let it be supposed that A is zero 
below 4» — 1. This corresponds to the estimate 
made earlier in the chapter that the input sig- 
nal components in antiaircraft work lie roughly 
in the band below about 0.1 or 0.2 cycle per sec- 
ond. Let it be supposed also that A at higher 
frequencies is equal to 3 nepers, corresponding 
to an average amplitude reduction of about 20 

c This is not intended to imply that the distinction 
between the initial waiting period and the "true tran- 
sient" period is quite as sharp as it is in Figure 10. The 
selectivity in a data-smoothing filter is usually not 
great enough to justify the assumption that components 
beyond the linear phase region are of negligible im- 



to 1. Then dB/d* at the origin is given from 
equation (3) as S/n seconds, and in accordance 
with the rule just enunciated the minimum de- 
lay to be expected from such a structure in a 
data-smoothing application would consequently 
be 12/ir seconds. 

Aside from such specific quantitative rela- 
tions equation (3) is useful as a basis for a 
number of important qualitative conclusions. 
One, for example, is the fact that although a 
lag is a necessary concomitant of any system 
showing a high-frequency attenuation, the 
amount of the lag depends greatly upon the 
portion of the frequency spectrum in which 
the attenuation is found. Since the integral is 
taken on an inverse frequency scale, a small 
attenuation at low frequencies is much more 
important than a considerably greater attenua- 
tion further out in the spectrum. This points to 
the desirability of designing tracking instru- 
ments which generate principally high-fre- 
quency noise, even if the amplitude of the noise 
is somewhat increased thereby. We may also 
notice that since the attenuation is a logarith- 
mic function of amplitude an initial moderate 
reduction in the amplitude of disturbing noise 
may be much less expensive in lag than subse- 
quent attempts at further reduction. For ex- 
ample, an amplitude reduction from 100 to 10 
per cent over a given portion of the frequency 
spectrum produces no more lag than a subse- 
quent reduction from 10 to 1 per cent. 


In Chapter 7 we distinguished between what 
we called the simple data-smoothing problem 
and the data-smoothing and prediction prob- 
lem. The simple problem, with which this re- 
port is chiefly concerned, is the one which has 
been given principal attention thus far. On 
account of its broad interest, however, it seems 
worth while to include also a brief statement 
of Wiener's solution of the general problem. 
The method of development used here is intui- 
tive and nonrigorous in comparison with 
Wiener's own development, but it permits the 
principal relations to be established by very 
elementary means. 

It is convenient to consider first the zero 
noise case. The past history of the signal, then, 

is known perfectly, and the existence of a 
prediction problem depends entirely upon the 
fact that since the signal is assumed to be sta- 
tistical in character, its future is not com- 
pletely determined from its past. The situation 
can be thought of in the terms suggested by 
Figure 11. The actual signal output appears at 






r l 


Figure 11. Schematic representation of Wiener's 
prediction theory when there is no noise. 

P,. In accordance with the discussion earlier 
in the chapter, we imagine this signal to be 
generated by passing flat noise through the 
shaping network N x . The transfer admittance 
Y x (iio) of N t is determined from the power 
spectrum of the signal by the procedure out- 
lined earlier and is a minimum phase shift char- 
acteristic. It will be recalled that minimum 
phase shift transfer admittances have the im- 
portant property that their reciprocals are also 
the transfer admittances of physically realiz- 
able networks. 

From F, we can readily compute the tran- 
sient response characteristic of N\. We shall 
assume for illustrative purposes that the im- 
pulsive admittance of N l takes the special 
shape shown by Figure 12. 

Figure 12. Assumed impulsive admittance of 
shaping filter. 

The flat noise is thought of as consisting of 
a large number of elementary impulses with 
random amplitudes and occurring at random 
times. For the purposes of this analysis, how- 
ever, it is sufficient to consider only the three 
unit impulses shown in Figure 13. Impulse B 
is supposed to occur at the instant at which 



the prediction is to be made, A occurs two sec- 
onds in the past, and C, one second in the 
future. The response of AT, to these three im- 
pulses will evidently be three curves of the 
sort given by Figure 12, suitably displaced in 
time as shown by Figure 14. 



-2 -I 

Figure 13. Impulses giving rise to applied signal 
through shaping filter. 

The desired output of the predicting network 
is the curve of Figure 14 advanced by the pre- 
diction time, which we can assume, for illus- 
tration, to be two seconds. It may be assumed 

SUM \ 



1 , 

a • I 

» " 


9 1 

"Hf \r 

/\ '* 
/ V * 





t \ 

% \ 
* \ 



. * 








2 4 t 


Figure 14. Applied signal at P„ 

for the sake of preliminary analysis that the 
input of the predicting network is the three 
original impulses of Figure 13. The terminal 

P t at which they are supi 

appear is of 

course a purely fictitious one and is not acces- 
sible to us physically. We can, however, con- 
struct the equivalent terminal P' 3 by imposing 
the actual signal from terminal P x on the net- 
work N 2 , whose transfer admittance is the 
reciprocal of that of 

Let the predicting network connected to ter- 
minal Fa be represented by N,. Obviously a 
perfect prediction would be secured if N t could 
be assigned the impulsive admittance shown in 
Figure 15, that is, an impulsive 



2 ( 

» ; 

> A 

6 « 

Figure 15. Iueal impulsive a 
tion network N, in Figure 11. 

equal to the impulsive admittance of the origi- 
nal network but moved forward by the 2-second 
prediction time. Then all the constituent curves 
and the sum curve in Figure 14 would similarly 
be moved forward. Of course we cannot assign 
AT S an impulsive admittance which is different 
from zero at negative times without postulat- 
ing a nonphysical network. It is, however, per- 
fectly possible to define N, from the portion of 
the impulsive admittance characteristic at posi- 
tive times, with the remainder set equal to 
zero. This gives an impulsive admittance of 
the type shown by Figure 16. When energized 
by the three unitary impulses, it gives the 
result shown in Figure 17. The contributions 
of impulses A and B are not affected by the 
absence of a negative time portion of the im- 
pulsive admittance, but the contribution of im- 
pulse C is lost. 

To formulate a physical prediction network 

2 < 

\ A 

Figure 16. Realizable portion of required im- 
pulsive admittance. 




we have merely to find by conventional meth- 
ods the steady-state admittance Y, corre- 
sponding to the impulsive admittance of Figure 
16. The two networks A T , and A 7 ;1 may then be 

in the manner shown by Figure 18. The first 
source of flat noise, together with the shaping 
network N,„ is the combination we have already 
used to represent the signal in the noise-free 

-2 2 4 6 8 

Figure 17. Response of realizable prediction net- 

combined to give a single structure with the 
transfer admittance Y,Y : = YJY, which will 
give the complete prediction when energized by 
the actual signal. 

The mean square error in prediction is 
easily determined from the fact that the con- 
tributions of all impulses of the sort repre- 
sented by C, occurring in the prediction in- 
terval, are lost. Since impulses in the flat noise 
source occur at random times the mean square 

error is proportional 

to jT 

W-( T )d T , where a 

is the prediction time and W is the impulsive 
admittance of Figure 16. Since the flat noise 
impulses occurring after the time at which the 
prediction is made are surely unpredictable, it 
is clear that this error is the least we could 
expect any physical prediction network to have 

When the input data includes noise as well as 
the signal it is natural to think of the situation 





Figure 18. Circuit representation of random func- 
tions representing signal and noise. 

case. The addition of noise is represented by 
the second independent source of flat noise with 
its associated shaping network N h . They com- 
bine to give the total input measured at P t . 

This diagram emphasizes the fact that we 
think of the noise and signal as originating 
from different physical sources. By postulate, 
however, we are not able to separate the 
sources experimentally. So far as any observed 
result is concerned, consequently, we may as 
well deal with the simplified structure shown 
in Figure 19 which contains a single source of 

f LAT 





— * 




Figure 19. Schematic representation of Wiener's 
prediction theory when there is noise. 

flat noise and a single shaping network. The 
transfer admittance of the shaping network N, 
is determined by adding the power spectra of 
signal and noise, converting the result to an 
amplitude characteristic, and computing the 
corresponding minimum phase according to 
^methods already used for the noise-free 

Although we cannot separate the signal from 

d Note that the Bhaping network thu* obtained ia not 
the same as the one we would secure by adding the 
transfer admittances of N. and N, in Figure 18 di- 
rectly. In order to realize the same total power at P, 
in each case, it is necessary to begin by adding the 
powers rather than the amplitude characteristics asso- 
ciated with the two paths. 




the noise completely, we saw earlier that the 
mean square difference between the total input 
and the signal is minimized if we multiply the 
amplitude of the input at each frequency by 
the ratio of the signal power to the sum of the 
signal and noise powers. A fictitious filter 
having the prescribed amplitude characteristic 
is represented by N t in Figure 19. We assigned 
2V 4 a zero phase characteristic so that there 
may be no lag in producing the result at P,. 
Thus the output at P s at any instant represents 
the best conceivable estimate (in the least 
squares sense) of the signal at that instant. 
The assumption of zero phase, of course, makes 
N i nonphysical, since it must have at least the 
minimum phase characteristic associated with 
its prescribed amplitude characteristic. This, 
however, is not an objection here since the 
structure is introduced purely for purposes of 

The situation is now reduced to a form in 
which it is substantially equivalent to the one 
appearing in the zero-noise case. Wi assume a 
series of random impulses at P., which would 
produce responses at P,. The problem is that 
of advancing the response to each impulse so 
that the same result appears u seconds earlier 
at terminal P 4 . The solution is represented by 
networks 2V, and N 3 , which discharge functions 
similar to those of the correspondingly labeled 
networks in Figure 11. Thus, the network N 2 
is the reciprocal of N, and is provided to make 
terminal P' 2 equivalent to P„ as a source of im- 
pulses. Network N 3 is defined by an impulsive 
admittance obtained from the impulsive admit- 
tance between P, and P, by advancing the 
latter characteristic a units in time and then 
discarding the portion at negative time. 

In this procedure there is only one point at 
which the situation differs from that without 
noise. In the noise-free case, the original im- 
pulsive admittance which we wished to advance 
in time was identically zero at negative times. 
In order to secure a physically realizable re- 
sult, we needed only to discard the portion of the 
impulsive admittance between t = and ( = a. 
In the present situation, on the other hand, the 
impulsive admittance is taken from a path in- 
cluding the nonphysical network N t . Thus the 
admittance may be expected to take such form 
as that shown in Figure 20, with nonzero am- 

plitudes at both negative and positive times, 
and in order to secure a physical final network 
it is necessary to discard everything to the left 
of the line a. 

Figure 20. Typical impulsive admittance of best 
smoothing network Ni in Figure 19. 

This difference in the impulsive admittance 
characteristics has two consequences. The first 
is the fact that since the uncertainty of the 
prediction is measured by the amount of im- 
pulsive admittance which must be discarded, 
it is evidently greater in the present case where 
we are discarding much more. The second is 
the fact that in the noise-free case uncertainty 
exists only for a positive prediction time. A 
negative prediction time, which corresponds, of 
course, to the determination of the value as- 
sumed by the signal at some time in the past, 
can be set into the analysis as easily as a posi- 
tive prediction time, merely by shifting the im- 
pulsive admittance to the right rather than the 
left. In the noise-free case, however, there is 
nothing to be discarded when we shift to the 
right, since the impulsive admittance with 
which we begin is in any case identically zero 
for negative times. Thus the uncertainty in 
the determination of any past value of the sig- 
nal is zero. Since we have postulated no noise 
to confuse the data, this is, of course, an 
inevitable result. As soon as noise is included, 
on the other hand, there is no such sharp dis- 
tinction between the future and the past. e The 
uncertainty in the determination of the true 
value of the signal in the near past is almost 
as great as it is in estimating what the signal 
will be in the near future. As we go further 

* This statement is to be understood in a physical 
rather than a mathematical sense. It is not intended 
to imply that there may not be sharp changes of be- 
havior in the impulsive admittance at zero. 




and further into the past the uncertainty 
gradually diminishes. If we can allow ourselves 
unlimited lag, we at length reach a point at 
which the discarded portion of the impulsive 
admittance characteristic is negligibly small. 
This, however, does not mean that all uncer- 
tainties have disappeared, but merely that we 
can base our estimate of the signal upon the 
power-ratio rule developed previously. 


It has been fairly easy to develop a qualita 
tive picture of the general characteristics of 
typical data-smoothing networks. As we have 
seen, they have amplitude characteristics of the 
low-pass filter type combined with lagging 
phase shifts. No corresponding qualitative pic- 
ture of the characteristics of a typical overall 
predicting circuit has, however, been developed 
as yet. The discussion just concluded provides 
a rule for determining the characteristics of a 
predicting circuit in any given case, but pro- 
vides comparatively little in the nature of a 
description of the result we may expect to 

In any particular situation we can, of course, 
calculate the overall characteristics of the pre- 
dicting circuit. A simpler way of character- 
izing the overall predictor characteristic quali- 
tatively, however, is based upon the use of the 
attenuation-phase relations for physical net- 
works. We need merely use such an equation 
as (3) backward. Thus, we have previously 
shown that a positive phase slope corresponds 
to a lagging output. Correspondingly, a nega- 
tive phase slope can be interpreted to repre- 
sent a lead, or in other words, a prediction.' 

If we assign (dB/di>) u = in equation (3) a 
negative value, we see that A-A must on the 
average be negative. In other words, the am- 
plitude characteristic of an overall prediction 
circuit must rise, on the average, as we proceed 
upward from zero frequency. This is in marked 
contrast to a data-smoothing network, which, 
as we have seen, tends to have a low-pass filter 
type of characteristic with a falling amplitude 
characteristic at high frequencies. The in- 
creased amplitude of response may have two 
detrimental effects. In the first place, it evi- 
dently produces a- distorting effect on any sig- 
nal components to which it applies. In the 
second place, it produces an exaggerated re- 
sponse to noise. 

Examples of the characteristics of overall 
prediction circuits are readily constructed by 
reference to the circuit of Figure 21. Various 

Figure 21. One-dimensional prediction circuit 
with data-smoothing networks. 

' This, of course, does not mean that a network with 
a negative phase slope can predict a perfectly arbitrary 
event. We can hope to realize a negative phase slope, 
in combination with a flat amplitude characteristic, 
over only a finite band. The spectrum of an arbitrary 
event, that is, any suddenly applied signal, will always 
include important components running out to infinite 
frequency, where the negative phase slope can no longer 
be realized. The statement does, however, mean that if 
we suddenly apply a signal made up of one or more 
low-frequency sinusoids, and wait for the steady state 
to become established, the output will appear to lead 
the input by a time equal to the slope of the negative 
phase characteristic. 

particular results are obtained by assigning 
particular characteristics to the data-smooth- 
ing network. Thus, if the data-smoothing net- 
work is absent entirely the transmission 
through the path containing the differentiator 
is u,t lt since differentiation is equivalent to 
multiplication by i*>. The attenuation of the 
overall circuit is consequently A = — log 
|1 + imt f \. This is plotted as curve I of Figure 
22. The increasing amplitude characteristic at 
high frequencies is obviously due fundamen- 
tally to the increased transmission through the 
differentiator circuit. 

If the data-smoothing network is assigned 
the characteristic (1 + to**)- 1 , corresponding to 
a very simple low-pass filter type of response, 
the overall transmission becomes that shown 
by curve II in Figure 22. (It is assumed that 
a = t,, for simplicity.) The negative attenuation 
at high frequencies is much reduced. This is 
paid for by an increased amplitude of response 
at low frequencies, but since the integration in 
(3) takes place on an inverse frequency scale, 
the low-frequency fragment is much less than 
the gain reduction at high frequencies. Curve 




Ill shows the result whan the data-smoothing 
network is assigned the characteristic 
(1 + um) *. Finally, curve IV shows the result 
obtainable when there is also a After in the 








Figure 22. Attenuation characteristics of predic- 
tion circuit shown in Figure 21. 

present-position circuit (as shown by the 
broken lines in Figure 21), so that there may 
be a net positive attenuation at high fre- 

In view of the inverse frequency scale in (3), 
the gross negative attenuation will be mini- 
mized if the negative attenuation region is 
placed very close to zero frequency. This, how- 
ever, means that much of the signal energy 
falls in the negative attenuation region so that 
in certain respects, at least, the signal response 
must be seriously injured. For example, in the 
specific circuits just discussed we can place the 
negative attenuation region at very low fre- 
quencies by choosing very long time constants, 
a, in the data-smoothing networks, with the 
consequence that the circuits will operate cor- 
rectly for any long continued straight line path, 
but will be very sluggish in changing from one 
straight line to another. If the negative attenu- 
ation region is placed at higher frequencies, on 
the other hand, the signal response is improved 
but beyond certain limits the circuit becomes 
unbearably sensitive to noise. 

Quantitative illustrations of these relation- 
ships are quickly constructed. Suppose, for ex- 
ample, that the prediction time is 2 seconds. 
From (3) this is consistent with an attenua- 

tion characteristic having zero attenuation 
below - = 1 and a net gain of *■ nepers there- 
after. In other words, the amplitudes of all 
frequencies below « = 1 are increased by a fac- 
tor of about 22 to 1. If the region of added 
gain is pushed to a higher frequency or con- 
centrated within a narrow band, the multi- 
plying factor rapidly becomes larger. For ex- 
ample, if we maintain A at approximately zero 
below m = 2, the average gain above this point 
must be 2» nepers, corresponding to a multi- 
plying factor of 600 to 1. We secure the same 
factor by attempting to concentrate the region 
of negative attenuation in the band between 
m = 1 and m = 2. The multiplying factor also 
goes up rapidly as we increase the prediction 
time. For example, with the gain uniformly 
spread over the frequency region above «> = 1 
the multiplying factor is 500 for a prediction 
time of 4 seconds, or more than 10,000 for a 
prediction time of 6 seconds. 

Reasonable multiplying factors with long 
prediction times can be obtained only by carry- 
ing the negative attenuation region to very low 
frequencies. As indicated previously, the cost 
of this is an increase in the time required for 
the signal to change from one constant or 
nearly constant value to another. For exam- 
ple, in the first illustration above, if the region 
of nepers net gain is carried down from 
o> = 1 to n = 0.2 the integral in (3) is just five 
times as great as it was before, so that the 
characteristic corresponds to a prediction time 
of 10 rather than 2 seconds. This change 
would correspond to an increase* from perhaps 
4 or 5 to perhaps 20 or 25 seconds in the time 
required for the circuit to settle from one con- 
stant value to another. 

Practical examples of the transmission char- 
acteristics of overall prediction circuits, with 
particular emphasis on the dominant effect of 
even very small negative attenuations at ex- 
tremely low frequencies, are shown later in 
Figures 5 to 8, inclusive. In the linear predic- 
tor, A - A„ varies as — ku> 2 nears zero, and it is 
easily seen that such a term makes a finite con- 

« Only rough numbers can be given, since circuits 
with the square-cornered attenuation characteristics 
chosen for illustrative purposes would have very ripply 
transient characteristics, corresponding to no very well 
marked settling time. 




tribution to the integral in (3) . On the other 
hand, the attenuation of the quadratic predic- 
tor, which is capable of dealing exactly with 
polynomial functions of time of the second 

degree or less, is necessarily zero at the origin" 


v2*£ f JS£ of Quasi-Distortionleas Prediction 

Networks in Appendix A. 

to terms of the order of « 4 , so that the integral 
in this region can be neglected. This slight 
difference between the two characteristics at 
frequencies of the order of 0.01 cycle per 
second and below is sufficient to balance the 
obviously greater negative attenuation of the 
quadratic predictor at higher frequencies. 


Chapter 9 


THE discussion in the previous two chap- 
ters has been based upon the assumption 
that the least squares criterion forms a suita- 
ble measure of performance for a predicting 
network. This assumption permitted us to re- 
strict our attention to the amplitude spectra 
of the signal and .noise, leaving phase relations 
entirely out of account. Thus, both signal and 
noise could be thought of as "random noise" 
functions characterized by random phases and 
Gaussian distributions, as described in the 
preceding chapter. So far as the noise is con- 
cerned, there seems to be nothing wrong with 
this assumption. In the case of the signal, how- 
ever, it appears that significant phase relations 
may exist. This chapter will consequently set 
up an alternative analysis which permits the 
significance of possible phase relations in the 
target paths to be estimated. 

The alternative analysis is based upon the 
assumption that the target courses are sequen- 
ces of analytic segments of different lengths 
joined together. These segments are simple 
predictable curves such as straight lines, pa- 
rabolas, and circles. Significant phase relations 
are implied by the assumption that there are 
sudden changes from one type of course to 

This picture of target paths is, of course, 
extreme. There are no such sharp discontinui- 
ties between one segment and another, nor do 
airplanes fly perfectly along simple curves 
even for limited periods. Nevertheless, it is 
the conception of target courses upon which 
the rest of our analysis is based. The reasons 
for believing that it is a closer approximation 
to actual target courses than, say, a random 
noise function with the same power spectrum 
would be, are given later. Perhaps more im- 
portant is the fact that the possibility of hit- 
ting an airplane flying along such a simple 
analytic arc is much greater than it would be 
if we were attempting to predict a correspond- 
ing random noise function. It is thus advan- 
tageous to take the analytic arc assumption as 
a basis for designing the prediction circuit, 

even if the assumption seems to be reasonably 
well justified over only occasional segments of 
actual target paths. An example of such a 
situation is furnished by the bombing run 
illustration described in Chapter 7. 

As a corallary to the analytic arc assump- 
tion it is also assumed that the theoretical 
predicted point must be quite close to the actual 
target position if the probability of scoring a 
hit is to be appreciable. In other words, such 
dispersive factors as random errors in com- 
puter or gun or the lethal radius of the shell, 
which would tend to produce occasional hits at 
long distances from the theoretical predicted 
point, are quite small. This is such a plausible 
assumption in the light of present-day antiair- 
craft experience that its critical importance in 
the present argument is likely to go unper- 
ceived. However, this is the assumption which 
limits consideration to small errors in predic- 
tion, whereas the least squares criterion natu- 
rally gives greatest emphasis to large errors. 
If, for example, antiaircraft projectiles were 
suddenly endowed with a much greater de- 
structive radius, we would be much more in- 
terested in fairly large misses, and the objec- 
tions to the least squares criterion would disap- 

These postulates are discussed in more detail 
in the following sections. In anticipation of 
this discussion the following conclusions may 
be mentioned: 

1. With the assumptions as stated, the pre- 
diction should be on a modal rather than a 
least squares basis. In other words, the gun 
should be aimed at the most probable future 
position of the target. 

2. Modal prediction requires evaluation of 
the parameters of the analytic arc the target 
is at present traversing. This can be accom- 
plished by smoothing the values of these pa- 
rameters evaluated for a period in the past. 

3. If the smoothing is performed by linear 
invariable networks, the impulsive admittances 
of these networks should have a definite cutoff 
after a finite smoothing time. By this means 




all data over a certain age are given zero weight. 
The method of calculating the proper smooth- 
ing time is developed. 

4. Definite advantages can be obtained from 
circuits with variable smoothing times if such 
systems can be satisfactorily mechanized. 


The target courses, like the tracking errors, 
can be thought of as a statistically generated 
set of functions — that is, a stochastic process. 
The structure of this process is, however, very 
different from that of the tracking errors. It 
is by no. means satisfactory to assume the 
target courses to be equivalent to a random 
noise having the same power spectrum as the 
target courses. As we pointed out in Chapter 
7, the target is piloted by a purposeful human 
being. It tends to follow a definite simple curve 
for a period of time and then to shift to a new 
simple curve. Much of the flight is in attempted 
straight lines with constant velocity. Most of 
the remainder can be considered to be segments 
of circles or helices in space, or as segments of 
parabolas or higher degree curves. Straight 
line constant speed flight corresponds to the 
airplane controls in a neutral position. The 
helical flight is a natural generalization allow- 
ing arbitrary, but fixed, positions of the con- 
trols. The curves which are parabolic functions 
of time correspond to constant acceleration in 
the three space coordinates. Thus, all these 
assumptions have a reasonable physical back- 

Most antiaircraft computers are constructed 
on the assumption of straight line flight, al- 
though some work has been done in World 
War II on curved flight directors both with the 
helical and the parabolic assumptions. There is 
not a great deal of difference in these two 
generalizations from the practical point of 
view, since determination of acceleration terms 
is subject to such large errors in any case. 

The important part of this representation 
of the target courses is that they consist of 
segments of simple analytic curves joined to- 
gether. The individual segments are completely 
predictable if we have a part of the segment 
given exactly. One need merely evaluate the 
parameters of the segment from the given part 

and evaluate the curve for t - t f . The unpre- 
dictable part of the target courses is due to the 
possibility of sudden changes from one segment 
to another. With random noise functions the 
unpredictableness occurs continuously. 

This simplified description of the target 
courses as piecewise analytic functions must 
be recognized as only a first approximation. A 
more complete description of the target course 
would include the "fine structure," the con- 
necting curves between the various analytic 
segments and the deviations from the segments 
due to random air disturbances and similar 
causes. This latter effect, the wandering of the 
target from its intended path, might be reason- 
ably well represented by the addition of a 
random noise function to the piecewise analytic 
functions described above. 


The analytic segments of which the course 
is supposed to consist are not all of the same 
duration — we may assume some probability 
distribution of the duration of these segments. 
The simplest assumption here is that the 
breaks occur in a Poisson distribution in time. 
This assumption is not necessary for our 
analysis but is a reasonable one and leads to 
a simple mathematical treatment. Any other 
reasonable distribution would give comparable 

A series of events is said to occur in a 
Poisson distribution in time if the periods be- 
tween successive events are independent in the 
probability sense and are controlled by a distri- 
bution function 

p(l)dl = - e-"« dl . 

Here p(l)dl is the probability of an interval of 
length between I and I + dl. This means that 
the frequency of intervals of a given length is 
a decreasing exponential function of the length. 
This type of distribution is familiar in physics 
as describing the decay of radioactive sub- 
stances. The time a in the distribution function 
is the average length of the intervals, since 





- e - ' /a dl 
'o ° 

= a . 

It is related to the "half life" 6 of the interval 

b = a In 2 . 

The single number a completely specifies the 
Poisson distribution. The events may be said 
to be happening as randomly as possible apart 
from the fact that they occur at an average 
rate of 1/a per second. 

Another way of describing a Poisson distri- 
bution of events is the following. The probabil- 
ity of an event in a small interval of duration 
dl is (l/a)dl and is independent of whether or 
not events have occurred in any other nonover- 
lapping intervals. 



Let us suppose that we have a record of the 
course of the target up to the present time and 
a complete statistical description of the set of 
target courses. What can then be said about the 
position of the target t t seconds from now? If 
we were able to analyze the data completely 
the most we could obtain would be a probability 
distribution function for the future position. 
This distribution function would give the prob- 
ability, in the light of the course history, of 
the target being at any point in space at the 
future time. This function would assume large 
values at likely points and low values at un- 
likely points. For t, small the distribution 
would be highly concentrated and for larger l t 
it would tend to spread out. 

In the simple case we have been discussing, 
of a Poisson distribution of sudden changes in 
type of course, the distribution consists of two 
parts. First, there is a spike of probability at 
one point, the continuation of the present pre- 
dictable segment. Second, there is a continuous 
distribution which corresponds to possible 
changes to a new segment during the time of 
flight. As t, increases the total probability in 
the spike decreases exponentially toward zero, 
and the total in the continuous part increases 
exponentially toward unity. The behavior is 
roughly as indicated in Figure 1. 




3-2-1 ( 

) 1 2 3 

Figure 1. 
sition of 

Probability distribution of future po- 
target, assuming piecewise analytic 

A very different type of future position dis- 
tribution is exhibited with other assumptions 
about the target courses. For example, suppose 
the courses were random noise functions with 
the power spectrum 

P ^ = ^Ar-, • 

fl2 + 0)2 

A typical noise function with this spectrum is 
shown in Figure 2. In Figure 3 is shown a 
typical velocity under the other assumption, 
that the courses are piecewise analytic and in 
fact straight lines between breaks. If the 
breaks are Poisson distributed, both Figure 2 
and Figure 3 have the same power spectrum, 
l/(a 2 + a. 2 ). The future distribution of veloci- 
ties for Figure 3 is shown in Figure 1, and for 
Figure 2, it will be as shown in Figure 4. In the 
random noise case the future distribution is a 




Gaussian distribution with no spike. The center 
of this distribution decreases exponentially to- 
ward zero with increasing time of flight ac- 
cording to the formula 

Xtj = A'o e " f 

where X is the present value of the function 
and X., is the mean of the future distribution. 

*t t 


— , 1 

Figure 2. Typical noise function. 

The standard deviation <r of the distribution in- 
creases exponentially toward the rms value of 
the function according to 

u = A(l - e-*"/). 

Supposing that this distribution function 
could be determined, where should the gun be 
aimed? The answer to this will depend on two 
factors: the gun dispersion, and the lethal 




Figure 3. Typical velocity function. 

effects of the shell. If the gun is aimed to 
explode the shell at a certain point in space, 
the shell will not necessarily explode at that 
point, but rather there will be a distribution of 
positions centered about the point aimed at, 
because of gun dispersion. Also, if the shell 
explodes at a certain point and the target is at 

another point, there will be a certain proba- 
bility of lethal effect which decreases rapidly 
with increasing distance between the points. 
These two functions could be combined by a 
product integration to give the probability of 
t if the target is at one point and 



■2-1 I 2 3. 

Figure 4. Probability distribution of future posi- 
tion of target, assuming courses with random 
noise properties. 

the gun aimed to explode the shell at a second 
point. To determine the probability of a hit 
when aiming at a certain point, then, we should 
multiply the probability of the target being at 
each point in space by the probability of lethal 
effect when it is at that point and integrate the 
product over all space. The optimum point of 
aim will be the one which maximizes this in- 
tegrated product. 

In one dimension this may be expressed 
mathematically as follows. Let P(x) be the 




future position distribution of the target, so 
that P(x)dx is the probability of it being in 
the interval from x to x + dx at the future time. 
Let Q(x,y) be the probability of hitting the 
target if the gun is aimed at point y and the 
target is at point x. Then the total probability 
of a hit when aiming at point y is 



P{x) Q(x,y\ dx . 

The point of aim y should be chosen to maxi- 
mize R(y). 

In the cases we consider, the lethal radius of 
the shell and the dispersion of the gun are both 
assumed to be small in comparison with the 
range of future positions if there is a change 
of course during the time of flight. This means 
that Q(x,y) is small unless x is xery near to y. 
Q(x,y) can be, in fact, considered to be a 8 
function of (x-y), and the value R(y) is then 
just a constant times P(y). Thus, the best 
aiming point under this assumption is the most 
probable future position of the target. The as- 
sumption of small lethal distance is generally 
valid with antiaircraft fire and ordinary chemi- 
cal explosive shells. 

Now the most probable future position in our 
case is the spike of probability corresponding 
to the analytic extrapolation of the present seg- 
ment of the target course. To determine its 
position one must find the parameters of this 
segment and evaluate for t, seconds in the 
future. For example, if the segments are as- 
sumed to be straight lines (constant velocity 
target) the velocity components are determined 
and multiplied by t, to give the predicted 
change in position. These changes are added to 
the present position to give the future position. 
If helical or parabolic segments are assumed, 
the parameters of these curves are determined 
from the past data, and the curves extrapo- 
lated t, seconds into the future. 

These conclusions may be contrasted with 
the idea of aiming at the point which mini- 
mizes the mean square error. The least squares 
criterion amounts to aiming at the mean or 
center of gravity of the future distribution of 
position. This point will ordinarily be under 
the continuous part of the distribution and not 
at the spike; e.g., the point marked in Figure 1. 
Its position depends to a considerable extent on 

distant parts of the distribution, which would 
surely bo complete misses in any case. The 
chief advanta.:; . the least squares criterion 
is that it fits in well with the mathematical 
tools suitable to these problems, leading to 
solvable equations. 

The least squarns < nterion will still appear 
in our analysis in rKat we attempt to smooth 
our course param>:t. ra in such a way as to 
minimize the mean square error in these, a 
very different thinp fr m minimizing the mean 
square error in th* redicted position of the 


The changes in the course parameters be- 
tween-adjacent segments can be very large. 
Also, at the start of operations and in changing 
from one target to another there will be large 
and erratic variation of the input to the 
smoothing and predicting circuits, unrelated to 
the present target course. If any of these data 
are used in prediction, the result will almost 
surely be a miss because of the small lethal 
radius of the shell. The only way to eliminate 
these errors in a linear invariable system is to 
have all weighting functions cut off sharply 
after a short time. Then ail data over a certain 
age are eliminated. Hits will occur only when 
the target has been on a predictable segment for 
this length of time or more and remains there 
at least t, seconds in the future. 

Suppose the weighting function for velocity 
has a 1 per cent tail beyond the cutoff point 
and that the trackers start following the target 
from a zero position. Then after the smoothing 
time there will be, because of the lack of exact 
cutoff, a 1 per cent error in velocity. If the 
time of flight were 15 seconds and the target 
velocity 200 yards per second, this represents 
an error of W yards in predicted position. 
Since this is comparable to the other errors in 
a typical director, we conclude that the tail of 
the smoothing curve should not be much greater 
than 1 per cent of its total area. 

Under the assumptions we have made, the 
proper smoothing time to maximize the number 
of hits can be determined as follows. Let P(l) 





be the probability that a predictable segment 
of the course lasts for I seconds or more. In 
the Poisson case this function is 

P(l) = e-' /a 

With a given smoothing time S there will be a 
certain probability of hitting the target, as- 
suming it has been on the present segment for 
S seconds in the past and will remain there for 
t f seconds in the future. We assume changes 
in course to be so large that any change re- 
sults in a miss. This probability of a hit Q(S), 
provided it remains on the course, will be an 
increasing function of S. Ordinarily the stand- 
ard deviation will decrease as the square root 
of the smoothing time. We have assumed the 
lethal radius of the shell small compared to the 
dispersion of shells about the target. The prob- 
ability of a hit will then vary inversely with 
the volume through which the shells are dis- 
persed. If the gun itself had no dispersion but 
all errors were due to tracking errors (and if 
the tracking error spectrum is flat), the prob- 
ability of a hit would then vary as KS*f* for 
S in the region of interest. This is because 
there are three dimensions and the expected 
error in each of these is decreasing as S~ 1/2 . 
With gun dispersion present, Q(S) will have 
the form 

w>-*(.? + .ij) 


where a, is the standard deviation due to the 
gun dispersion, and a 2 y/a/S that due to track- 
ing errors. The sum of the squares is the total 
variance in each dimension and the three- 
halves power gives the total dispersion volume. 

When these two functions P(l) and Q(S) 
are known, the best smoothing time is that 
which minimizes the product 

P(S + t f ) ■ Q(S) . 

The first term is the probability of a predict- 
able segment of the course lasting S -+- t f sec- 
onds, and the second term is the probability of 
a hit if it does last that long. Therefore, the 
product is the probability of a hit with smooth- 
ing time S. 

In the Poisson case, with no gun dispersion, 
the calculation is as follows : 

P(l) = e 

s + 1, 

P(S + t f ) = e~~ = Ae 

Q(S) = .S« 
f(S) = P(S + t,)Q(S) = Be~*'° 


f'(S) =b[< 

-S/a 3 ^1/2 _ l^-S/o^S/! 

S = la 

The proper smoothing time is % of the aver- 
age segment length, and is independent of the 
time of flight and all other factors. 

The presence of gun dispersion and computer 
errors which are independent of smoothing 
time decreases the best S from this value. In 
this case the equation for optimal S is the 

, 2S 3 a 




— = 


-4 + a^/c\ + 6<r« 

Here n, is the part of the errors which is in- 
dependent of smoothing time (dispersion 
errors in the computer, etc.) and a t is the error 
which varies inversely with the square root of 
S, a, being its value at S = a. Ordinarily ^ is 
several times a., in which case we have approxi- 

~* ~a~ o\ 

ffi Is 


There are other factors which we have neg- 
lected, which decrease the best smoothing time 
still further. The wandering of the target about 
the predictable segments assumed in the above 
simplified analysis makes old data less reliable 
and therefore reduces S. Also, there is the tac- 
tical consideration that when starting to track 
a target it is desirable to commence firing as 
soon as possible, even if reducing this time 
makes individual hits somewhat less probable. 
For these and other reasons the best smooth- 
ing time will be just a fraction of a. 





The compromise required in choosing a cer- 
tain definite smoothing time can be eliminated 
by the use of nonlinear elements. In particular, 
if a method is devised for determining when 
changes of course occur, this indication can be 
used to start a new linear but variable smooth- 
ing operation, so that the device uses all the 
data pertinent to the present segment and no 
data from previous segments. There is a clear 
improvement in such cases although not so 
great as might be expected. There are many 
practical difficulties in proper adjustment of 
such a "trigger" action. If the trigger is too 
sensitive it will assume new segments due 
merely to tracking noise and seldom allow suffi- 
cient smoothing for accurate fire. If it is too 
insensitive it fails in its function of quickly 

locating changes of segment. Since the noise 
and target courses are subject to considerable 
variation, this aujustment is not easy. 

In such a system the smoothing may be 
linear — the only nonlinearity is the tripping 
circuit. The analysis of best weighting func- 
tions, etc., given in later chapters can for the 
most part be applied to such cases. There may 
also be advantages to be derived from making 
the smoothing operator depend on the general 
position in space of the target relative to the 
gun. The smoothing time may be varied, for 
example, as a function of the time of flight. 
This type of variation would be slow compared 
to the noise frequency, and here again the 
linear analysis can be used. 

Whether any real advantage can be obtained 
by "strongly" nonlinear smoothing in practical 
cases other than these two possibilities is ques- 


Chapter 10 


The analytic arc assumption described in 
the previous chapter immediately allows us 
to reduce a vast proportion of data-smoothing 
problems to a relatively conci'ete form. Obvi- 
ously the arc will be specified by a number of 
parameters and the principal object of the com- 
puting and data-smoothing circuits must be to 
isolate values of these parameters on the basis 
of which a prediction can be made. In practi- 
cal cases the instantaneous values of the 
parameters are isolated by coordinate con- 
verters. The function of the data-smoothing 
circuit is to provide a suitable average from 
these instantaneous values. This is called 
"smoothing a constant'' here since the param- 
eters are assumed to be constant along each 
arc, although they may change radically from 
one arc to another. 

The data-smoothing network is most con- 
veniently specified by its impulsive admittance. 
(See Appendix A.) In accordance with the 
assumptions made in the previous chapter, it 
will be assumed that the desired impulsive ad- 
mittance is identically zero after some limiting 
time T. Thus, T seconds after a change from 
one analytic arc to the next the new parameter 
value is established. T is the so-called "settling 
time" of the data-smoothing network. 

With the settling time limit given, the prob- 
lem of choosing a suitable data-smoothing net- 
work reduces to that of finding the best shape 
of the impulsive admittance characteristic for 
t < T. Obviously this shape determines how 
the output of the network changes in going 
from the parameter value appropriate for the 
first arc to that appropriate for the second. The 
exact way in which the response settles from 
one constant value to the next is, however, 
usually of comparatively little interest. The 
shape of the weighting function is of impor- 
tance chiefly because of its effect on the noise. 
For each noise spectrum there is, in principle, 
an optimum shape for the weighting function. 
The present chapter approaches the problem of 
choosing a shape which will minimize the effect 
of noise from several points of view. 

It should be noted that the term noise as used 
here does not necessarily refer to the errors 
associated directly with the tracking data. The 
tracking data may have been subjected to co- 
ordinate conversions, differentiations, or other 
processes of computation before reaching the 
data-smoothing network." The noise associated 
with the signal to be smoothed thus will usually 
have characteristics differing from those of the 
noise associated with the tracking data. 


Before attacking the problem of smoothing a 
constant in a systematic way it is worth while 
to consider an important special case. This is 
the so-called exponential smoothing circuit. It 
leads to a data-smoothing network in which 
the output V is related to the input E by 


r) dr 

so that the impulsive admittance W(t) is an 
exponential function of time, as illustrated by 
Figure 1. 

-2 2 4 6 

Figure 1. Simple exponential weighting function. 

An impulsive admittance of the type shown 
in Figure 1 does not show any very definite 
settling time. The exponential curve ap- 
proaches zero gradually, and it is a long time 
after a change in course before the effects of 
the data obtained on the old course are negli- 
gible. This is obviously an undesirable result, 

1 In exceptional circumstances the physical apparatus 
in which these processes are carried out may also be 
sources of additional noise. 





and the exponential weighting function is con- 
sequently not a recommended one for situations 
to which the analytic arc assumption applies. 
The exponential solution is, however, described 
here because it occurs in such a vast variety of 
cases. It is found, in fact, whenever the data- 
smoothing device is specified by a linear first- 
order differential equation with constant coeffi- 
cients. It may thus correspond to many simple 
situations. For example, this is the result 
which would be obtained in an electrical circuit 
if we smoothed the data by placing a simple 
shunt capacity across a resistance circuit. In 
mechanical structures it is encountered when- 
ever the damping depends either upon simple 
inertia or a simple compliance. 

Simple exponential smoothing also occurs in 
a variety of other situations which may be 
somewhat less obvious. For example, it is the 
effective result in either an aided laying or a 
regenerative tracking scheme whenever the 
ratio between rate and displacement correc- 
tions is fixed. Another somewhat similar ex- 
ample is furnished by the feedback amplifier 
circuit shown in Figure 2. Since rapid fluctua- 

Figurx 2. Feedback amplifier circuit giving simple 
exponential weighting function. 

tions in the output of this amplifier are fed 
back through the capacity and tend to oppose 
the input voltage, the structure acts as a 
smoother, and more detailed analysis would 
show that it has characteristics similar to those 
obtained by using a shunt capacity across a 
resistance circuit. The structure is introduced 
here because considerable use is made of it in 
connection with the discussion of nonlinear 
smoothing in a later chapter. 

One simple conclusion about data-smoothing 
networks can be drawn immediately from this 
discussion. Since all structures simple enough 
to be specified by a first-order differential equa- 

tion give exponential smoothing, which has no 
very well-marked settling time, it is clear that 
a data-smoothing network which shows a well- 
defined settling time must probably be at least 
moderately complicated. 


Consider the signal E shown in Figure 3 
under the assumption that the true signal is 
constant and the superposed noise is random 

t-T t 
Figure 3. Piecewise constant signal with noise. 

with a flat spectrum. The best constant A, in 
the least squares sense, which can be fitted to 
the signal from t - T to Ms that which mini- 


[A - E(X)] 3 d\ , 



E(K) . 


Comparing this with equation (2), Appendix 
A, it will be seen that A, which is obviously a 
function of t, is the response to the assumed 
signal of a network whose impulsive admit- 
tance is 



< t < T 


This is the best weighting function for smooth- 
ing under the assumed circumstances. It is 
illustrated in Figure 4. 

A more complex situation is one in which the 
true signal is a line of constant slope with 




Figure 4. Best weighting function for smoothing 
piecewise constant signal. 




superposed flat random noise, as shown in Fig- 
ure 5. For convenience the analysis will be 
conducted in terms of the age variable r » t - \, 

t-T t 
Figure 5. Piec^wise linearly varying signal with 


The best straight lint' A — Br which can be fit- 
ted to the signal from r = to t = T is that 
which minimizes 

£ T [A-Br-E{t-r) Vdr. 

Hence A and B must satisfy simultaneously 

t t* i r T 

Eliminating A, we get 
whence by partial integration 



t) • t(T - r) dr 

Comparing this with (7), Appendix A, it will 
be seen that B, which is obviously a function of 
t, is the response to the derivative of the as- 
sumed signal of a network whose impulsive 
admittance is 


f' fV'f) 0<t<T 


This is the best weighting function for smooth- 
ing the derivative of the signal under the as- 
sumed circumstances. It is illustrated in Fig- 
ure 6 and is generally referred to as the "para- 
bolic weighting function." 

It should be noted also that the right-hand 
member of the first of equations (3) is form- 
ally the same as that of equation (1). Hence 
the response of the network specified by (2) 


Figure 6. Best weighting function for smoothing 
piecewise linearly varying signal. 

and illustrated in Figure 4, to the type of 
signal shown in Figure 5, will correspond to 
the value on the best straight line T/2 seconds 
back from t, the present time. This network is 
still the best for smoothing the signal, but it 
introduces a delay of one half of the smooth- 
ing time. The delay may be reduced only at 
the price of a reduction in smoothing unless the 
smoothing time is increased. 


The autocorrelation method with finite set- 
tling time was first used by G. R. Stibitz in 
numerical determination of the best weighting 
function for smoothing the derivative of track- 
ing data with typical tracking errors. This 
method was also used to determine the sensitiv- 
ity of smoothing to departures of the weighting 
function from the best form. 

The analysis is based up 


r) W(r) dr t> T 

for the response to the derivative of the error 
time function g(t) of a network whose impul- 
sive admittance or weighting function W(t) is 
identically zero for t > T as well as for t < 0. 
Since measured tracking errors are generally 
tabulated only at 1-second intervals, the in- 
tegral may be approximated by the sum 

- 1 



for integral values of t. 

The instantaneous transmitted power is the 




square of this expression, and the average 
transmitted power is 

P.v, = hill J. V ytt t \ 

* , To 

This may be expressed in the form 

^•.= LLW m _ {t2) -C m _ n -W,_ (h) (o) 


M.a - 1 


m — u 

is the autocorrelation of the errors. Having 
computed the autocorrelation, (5) may be mini- 
mized with respect to the W's by familiar 
methods, under the constraint 

mm 1 


" - * 

The values of W thus obtained are the speci- 
fication of the best weighting function." Equa- 
tion (5) may then be used to determine the 
sensitivity of smoothing to departures of the 
weighting function from the best form. 

Proceeding along this line, Stibitz found that 
the best weighting function for typical actual 
tracking errors was generally intermediate to 
the uniform and parabolic ones shown in Fig- 
ures 4 and 6. Furthermore, Stibitz found 
that the difference in smoothing obtained from 
the best weighting function on the one hand 
and from the uniform or the parabolic weight- 
ing function on the other hand, is negligible in 

The autocorrelation method was later for- 
malized by R. S. Phillips and P. R. Weiss who 
incorporated it into a theory of prediction. 7 A 
brief exposition of this formulation is given 
in Appendix B. 


For the purposes of this method, an ele- 
mentary noise pulse is defined by a time func- 
tion F (t) which satisfies the following require- 

1. Identically zero when t < 0. 

2. Contains no terms which increase expo- 
nentially with time. 

3. Power specLium N(„> 2 ) is the same as that 
of the noise. 

The noise is then regarded as the result of 
elementary noise pulses started at random. 
Alternatively, it may be regarded as the result 
of flat random noise passed through a network 
whose transmission function is S(p) = L 
[F„(t)]. As a matter of fact, only S(p) is 
required in the analysis, and this is readily de- 
termined from the relation 

|S(uo)l 2 = AF(«*) , 

together with the condition that S(u>) cor- 
responds to the transmission function of a 
minimum-phase physical structure (cf. Appen- 
dix B). 

The response F(t) to the elementary noise 
pulse F u (t) of a network whose impulsive ad- 
mittance is W(t) is given by the operational 

F(() = S(p) ■ W(t) 

in accordance with the footnote in Section A.5, 
Appendix A. The best form for W(t) is there- 
fore that which minimizes the integral 


[F(0i J dt 

under the restriction 

when t > T 

W(t) dt 



b The computations involved may be considerably re- 
duced by noting the symmetry property proved in Sec- 
tion B.2, Appendix B. 

This is as much of the elementary pulse 
method as we shall need in order to reconsider 
the cases treated in Section 10.2. For the treat- 
ment of more general cases the method is de- 
scribed in greater detail in Appendix B. 

The minimization of the integral (6) under 
the restriction (7) reduces to a simple isoperi- 
metric problem in the calculus of variations, in 
cases in which S(p) is a polynomial in p. It is 
essential first of all, however, to note that if 
S(p) is of degree n, the integral (6) will con- 
verge only if W(t) is differentiate at least n 
times. In other words, W (t) must have con- 
tinuous derivatives of all orders up to the 
(n-l)th inclusive, although the nth derivative 
may have finite discontinuities. In particular, 
if W(t) is to be zero outside of < t < T. its 




derivatives of orders up to the (n-l)th inclu- 
sive must vanish at both t = and t u T. These 
2n boundary conditions must be imposed on the 
solution of the Euler equation which in this 
case is 

Wit) = A . 


a is a constant parameter which is finally ad- 
justed to that the restriction (7) is satisfied. 

The first case treated in Section 10.2 is one 
in which N(„r) = 1, whence Sip) = landF(f) 
- W{t). The integral (ti) is a minimum under 
the restriction (7) if Wit) is constant by 
intervals. The restriction (7) then requires 
W(t) to be of the form (2). 

The case of first derivative smoothing treated 
in 10.2 is one in which X \ *») = «,, 2 , whence S ip) 
= p and Fit) =- Wit). If the integral (6) is to 
converge at all, 11/ (t) must not have discon- 
tinuities of impulsive or higher type; in other 
words, Wit) must be continuous through all 
values of t. The integral is a minimum under 
the restriction (7) if W(t) is constant by 
intervals. The restriction (7) then requires 
W(t) to be of the form (4). 

These results may be generalized immedi- 
ately. In whatever way the signal to be 
smoothed may have been derived from the 
tracking data, let the power spectrum of the 
noise associated with it be N(m 2 ) = a, 2 ". Then 
Sip) =p"andF(f) = W^ (t). If the integral 

(6) is to converge at all, w' n - n (t) must be con- 
tinuous through all values of t. The integral is 
a minimum under the restriction (7) if 
W Vin) it) is constant by intervals. The restric- 
tion (7) then requires W(t) to be of the form 


(2n + 1) ! 


+ 1)\ ft / t \1 ■ 

ssr [tO-jOJ o< i <T. (8 ) 

It may be noted that the convergence re- 
quirements which arise in the foregoing dis- 
cussion are directly related to the discussion 
and theorem in Section A.8, Appendix A, with 
respect to the relationship between discontinui- 
ties in the impulsive admittance and its deriva- 
tives on the one hand, and the ultimate cutoff 
characteristic of the transmission function on 
the other hand. The continuity of W lM) (t) is 
obviously required to make the transmission 
fall off ultimately at the rate of 6(n+l) db per 
octave against the rise of 6n db per octave in 
the noise power spectrum. 

The integral (6) may also be used to evalu- 
ate the relative advantage of the best weighting 
function over another weighting function. As 
an example, consider the case where the weight- 
ing function (2) is the best. The value of the 
integral (6) in this case is 1/T. If the weight- 
ing function (4) is used against the same noise, 
the value of the integral (6) is 6/5 T. Hence, 
as far as rms error or standard deviation is 
concerned, the second weighting function is 
V5/6 or 0.913 as efficient as the first. 


Chapter 11 


THE THEORY of "smoothing a constant" de- 
veloped in the preceding chapter will be 
extended in this chapter to the problem of 
smoothing a polynomial function of time of any 
prescribed degree. The extension is, however, 
restricted to the case of a flat noise spectrum. 
In addition to the smoothing problem, the 
analysis also provides a way of designing a 
network which will extrapolate the polynomial 
a given distance t, into the future. The network 
is so arranged that t, is continuously variable. 
In addition, the degree of the polynomial can 
readily be changed to fit changes in the com- 
plexity of the assumed form of the data, apart 
from noise. 

It is clear that these results amount, in a 
certain sense, to an alternative to Wiener's 
method for the design of prediction circuits for 
general time series. Thus, to predict a time 
series of any given complexity we would need 
only to begin with a polynomial of sufficiently 
high degree to fit the observed data, and extra- 
polate. Aside from the restriction to a flat 
noise spectrum, perhaps the most obvious dif- 
ference from Wiener's method is the fact that 
the settling time restriction limits the data 
upon which the prediction rests to a finite in- 
terval in the past. To advance such a prediction 
theory seriously, however, it would be neces- 
sary to go much farther into the way in which 
the degree of the polynomial is established and 
the justification for assuming that the extra- 
polated value represents a probable future 
value for the function.' 

This general discussion will not be under- 
taken here. Since prediction with high degree 
polynomials will certainly be sensitive to minor 
irregularities in the data, tracking errors 
would necessarily limit the application of the 
method in any case. If we confine ourselves to 
reasonably low degree polynomials, however, 

» As an example of possible difficulties we may notice 
the fact that two polynomials of different degree which 
approximate a given function as closely as possible, in 
a least squares sense, in a prescribed interval fre- 
quently differ radically outside that interval. 

the method is useful. An example is furnished 
by the prediction of airplane position, in rec- 
tangular coordinates, by quadratic functions of 
time. Here the square terms represent the 
effects of accelerations in the various coordi- 
nates. We can defend the inclusion of such 
terms on the ground that it is plausible to as- 
sume that an airplane may experience constant 
accelerations, due to turns, the force of gravity, 
etc., for considerable periods of time. The 
linear term represents plane velocity and needs 
no defense. The constant term, of course, gives 
the plane position at some reference time. In- 
cluding it in the smoothing operation is equiva- 
lent to introducing "present-position" smooth- 
ing of the sort suggested by the broken lines 
in Figure 1 of Chapter 7. h 

Aside from its direct interest as a possible 
prediction method, the analysis in this chapter 
is also of indirect interest for the additional 
light it sheds on the effect of the noise spec- 
trum on smoothing functions. It turns out that 
smoothing a power of time, with a flat noise 
spectrum, is equivalent to smoothing a constant 
with a somewhat different noise spectrum. 
Thus the smoothing functions developed for 
polynomials are also useful as special cases of 
smoothing functions applicable to constants. 


Let A be any past value of time and let t be 
the present value. If the data is fitted with a 
smooth curve E (k) , the predicted value may be 
taken as E(t + t f ). The procedure of fitting is 
the familiar one of minimizing the integral 

[ E(\) - E(\) ] J W,(t,\) rfX 

b In the circuit of Figure 1, Chapter 7, however, the 
smoothing network would produce a lag in the present- 
position data delivered to the prediction circuit, and 
this lag would, of course, mean some error in follow- 
ing a moving target. In the method described in this 
chapter such lags are automatically compensated for 
by adjustments in the coefficients of the other terms of 
the polynomial. 




with respect to disposable parameters in E(k) 
and a prescribed weighting function W n (t,k). 
The lower limit of the integral is indicated as 
— oo in compliance with the physical impossi- 
bility of discriminating between relevant and 
irrelevant data, with fixed linear networks, ex- 
cept on the basis of age. The burden of dis- 
crimination must be relegated to the weighting 
function which must be a function only of the 
age t - A. Under the ideal restriction that 
W n (t — A) is identically zero when t - A > T or 
A < t — T, the indicated lower limit of the in- 
tegral is purely nominal. 

As in Section 10.2, it is convenient to con- 
duct the analysis in terms of the age variable 
t = t — A introduced there. If 

In terms of the forward time A, (2) and (3) 
reduce to 

F(r) = F(r) = K{\) 

the integral to be mir 
in the form 

I may be expressed 

|>» - F(t)\ 2 ir„(r) i/t . 


In accordance with the discussion of quasi- 
distortionless transmission networks in Section 
A. 10, Appendix A, the smooth curve K (a) 
should be a polynomial in A. Hence F(t) 
should be a polynomial in r. It will be more 
convenient, however, to express F(t) formally 
as a linear combination of polynomials in t 
which may be orthogonalized. Hence, let 

F{r) = \\+\' i -G t (T)+\\-(,\(T)+ - +IV^'„<T) 


where G,„(t) is an mth degree polynomial in t. 
Let W u (t) be normalized in the sense that 

f W (r) dr = 1 

and the G m (r) be orthogonalized with respect 

to the weighting function W„(t) in the sense 

/ G,(t) G m (r) W (t) dr = if / * m 

Jo » f, 

= j - if / = m 

(G = 1, Ao = 1). 

The integral (1) is then a minimum with 
respect to the V m 's in (2) if 

V m = k m jf 00 F( T ) ■ GJt) ■ H'„(t) <tr . (3) 

E(\) = Y n (t) + Wit) ■ G x (t - A) + V,(t) ■ G t (t - A) 

+ - + V n (t) -G n (t-\) (4) 


!'„,(/) = k m f E(\) -G m (t-\). W (t-\)dk.(5) 

Expression (5) identifies the V m (t) as the 
responses to E(k) of fixed linear networks 
whose impulsive admittances are 

ir,„(r) = k„,G m (r) : W (r) . (6) 

By (4), the predicted value may be obtained 
by a linear combination of the responses of 

these networks, viz., 

Mi + U) = Y»(t) + Gii-t,) ■ \\(f) + G,(-i f ) -Vtit) 
+ ■■■ + G n (-i f ) ■ V n (t) . (7) 

A schematic representation of an nth order 
smoothing and prediction circuit, based on (7), 
is shown in Figure 1, where the G„, ( — t,) are 
represented as potentiometer factors dependent 
on the time of flight. 



I 1 i— Wv- 

- Y,(P) -AMAv-i 
U 1 G.C-t,) 




G n (- V 4- 

Figure 1. Schematic representation of nth order 
smoothing and prediction circuit. 

Alternatively, (7) may be written 

K(t + t/) = E(t) + - //) - G,(0)] • V,(0 + ••• 
+ [G n ( - t f ) - G„(0)] • V n (t) (8) 

where E(t) is then replaced by Eit) when 
position data smoothing is to be omitted. 

It is not necessary that the G,(r) polyno- 
mials be orthogonal. However, the circuit 
switching required to reduce or increase the 
order of the prediction is simplest when the 
G„,(t) polynomials are orthogonal. Orthogonal 
polynomials corresponding to any 




weighting function W ( T ) are readily derived 
by well-known methods,. 

The weighting function W ( r ) may be deter- 
mined by either of the methods described in 
Appendix B as the best weighting function for 
smoothing position data, under prescribed 
tracking error characteristics. Then the best 
impulsive admittances W m ( T ) for a smoothing 
and prediction circuit, are prescribed by (6). 

The relationship (6) shows that if the pre- 
scribed weighting function W ( T ) satisfies the 
formal requirements for physical realizability, 
so will all of the impulsive admittances W m ( r ). 
Of the standard sets of orthogonal polynomials 
those of Laguerre appear to be the best adapted 
to physical realization. The Laguerre polyno- 
mials L„ (a > ( T ) are orthogonal in < t < oo 
with the weighting function r a e~\ However, 
such a weighting function is, in general, very 
unsatisfactory from the practical point of view 
of settling characteristics. 

It is possible of course to approximate any 
prescribed weighting function W (t) as closely 
as may be desired in a physically realizable 
form, derive a set of orthogonal polynomials 
based on the approximate form, and determine 
the impulsive admittances W m ( T ) from (6). 
However, such a procedure leads to complexities 
of network configuration which increase very 
rapidly withrthe index to. This increasing com- 
plexity is hardly justifiable in practice. 

From the foregoing considerations, it ap- 
pears that the most practical procedure is to 
derive all of the impulsive admittances W m ( T ) 
without regard to physical realizability, ap- 
proximate them independently in physically 
realizable forms of independently prescribed 
complexities, and modify or redetermine the 
potentiometer factors in accordance with the 
discussion in Section A.10, Appendix A. 


The impulsive admittances defined by (6) 
for m > may not be regarded as weighting 
functions even though the response of the cor- 
responding networks to E (a) is, by (5) 

Vm (0 - f K(t -r) • W m (t) 'fir, 

because, with the exception of W e (r), the 
W m ( T ), as will presently be seen, cannot be nor- 
malized. The term weighting function is re- 
served for the functions defined by (11) below. 

Since r r is a linear combination of the G, (t) 
where s = 0, 1, • • • , r, it is obvious from (6) 


/ ?WUl) dr = 

when r < m . 
In particular 

/ WJr) dr = 

when m > . 

Since the transmission function Y m (p) of a 
network is the Laplace transform of its im- 
pulsive admittance (see Section A.3) , we have 

W m (r) e~'* dr 

y ( - p) r r 


The first m terms in this series vanish. Hence 
Y m (p) will be of the form 

T m (p) = r"y-(p) (10) 

where y m (0) ^=0. This permits us to regard the 
network whose impulsive admittance is W m ( T ) 
as an instantaneous mth order differentiator, 
corresponding to the factor p* in (10), in 
tandem with a purely smoothing network 
whose transmission function is y m (p). 

It is convenient to associate a weighting 
function w m ( T ) with the purely smoothing net- 
work whose transmission function is y m (p) . 
Dividing (10) through by p m the resulting 
operational equation may be interpreted (see 
Section A.5) to mean that the weighting func- 
tion w m ( T ) is the m-fold integral of the im- 
pulsive admittance W m ( T ) between the limits 
and t. This is expressed by 

o Jo WmiT) ' {dT)m - (11 > 

By a relationship similar to (9) between y m (p) 
and w Hl (r) , it follows from y m (0) ^ that 

u>„(r) dr * . 




Hence the w m ( T ) may be normalized in the it is readily determined that 
sense that 

jT w m (t) dr = 1 

jp- / [G«(t)]» W.(t) dr 
" ^/ o 

(2m)! (2m + 1)! ' 

for all values of to. However, this may he done 
in general only if the G„(t) polynomials, are Then, by (6) 
not normalized in the sense that k m = 1 i&c any 

value of to > 0. It is in fact readily shown that W m (r) = (-)m .( 2rw + U ! p m (2 T - 1) £ r :£ 1 

the coefficient of i* in G,„(t) must be the same 
as that of r m in c T . 




= r > 1 . 

Substituting this in turn into (11) and making 
use of Rodrigues' formula 

The Legendre polynomials P„ t (x) are orthog- 
onal with respect to the range-- 1 < x < 1 and 
uniform weighting. In other words, the poly- or 
nomials P„(2t — 1) are orthogonal with respect 
to the range < t < co and the weighting func- 
tion 6 

( — \ m d m 

p -<*> " SOT (1 " *>" 

p - (2t - 1} - S^r £ M 1 - w 

W (r) = 1 when <. r <, 1 
= when t > 1 . 

It is known from Section 10.4 that this form 
for the weighting function W (t) is best in 
case the tracking errors are flat random noise. 
In the integral (1) to be minimized, the G m (r) 
polynomials should then be 

The first few of these are tabulated below. 

it is finally found that 
(2m -I- 1)! 

= T > 1. 

[t(1 - t)]« £ T £ 1 


By a relationship of the form of (9) the 
transmission functions y m (p) corresponding to 
the weighting functions w m ( T ) may be deter- 
mined. The first three are 

1 - e-* 



G m (r) 

2 ~r 

2 i_I + I 1 
12 2 2 

3 — - + - - - 

120 10^ 4 6 


Vt(P) - J t l(P - 2) + (p + 2)9-'] 
V*(P) - p 1(P» " 6p + 12) - (pi + 6p + m-'\. 
These may be written in the form 

Vm(p) - QmM • r M 



With the help of the formula 

j [P m (z))*d* 

2m + 1 

The unit of time being equal to the nominal smooth- 
ing time. 



sin x / J\ 

-— V - V 

X cos z 

16 ~ xt ) SEj * ~ 31 006 * (14) 



or in the infinite power-series form 

„r, (» + «i 

Vt(p) = 60 £ 

■ -0 

(n + l)(n + 2) 
(n + 5)! 

(-P)V (15) 

Methods for obtaining physically realizable ap- 
proximations to the weighting functions w m (r) 
or impulsive admittances W m ( T ), based upon 
the Q functions (14) and the series expansions 
(15) are described in Chapter 12. 


Chapter 12 


This chapter will be devoted to a brief re- 
view of some of the methods and techniques 
which have been used in the physical realiza- 
tion of data-smoothing or weighting functions. 
The first two sections will be devoted to meth- 
ods for determining physically realizable ap- 
proximations to a desired weighting function. 
The third section takes up the use of feedback 
amplifiers and servomechanisms in order to 
avoid the use of coils of generally fantastic 
sizes. The final section takes up the design of 
resistance- capacitance networks. 

Methods of deriving physically realizable ap- 
proximations of best weighting functions may 
be divided into two classes, which may be 
called, for convenience, /-methods and p-meth- 
ods. The i-methods are those in which a pre- 
scribed best weighting function W(t) is 
approximated directly by a function W„(t) of 
realizable form, viz., a sum of decaying expo- 
nential terms and exponentially decaying sinu- 
soidal terms. However, the <-methods are most 
useful when the approximation is restricted to 
a sum only of exponential terms. According to 
the discussion in Section A.9, Appendix A, such 
a restriction corresponds physically to passive 
RC transmission networks. A <-method was 
used by Phillips and Weiss in the reference 
quoted in Section 10.3 to obtain an approxi- 
mation with one decaying exponential term and 
one exponentially decaying sinusoidal term. 
However, this method rapidly becomes un- 
wieldy as the number of terms is increased. 

The p-methods are those in which the ap- 
proximation is derived indirectly from the 
transmission function Y(p) corresponding to 
W(t). A rational function Y a (p) approximat- 
ing Y(p) is first determined. If it is realizable, 
and it usually is, then W a (t) = L^lYaip)]. In 
general, Y tt (p) will have complex poles and, 
therefore, W a (t) will have exponentially decay- 
ing sinusoids as well as simple exponentials. 
This gives the p-methods a considerable advan- 
tage over the f-methods in more efficient use of 
network elements. The fact that this generally 
calls for impractical element values in passive 

RLC networks is not serious. As shown in Sec- 
tion 12.3, the use of coils may be avoided 
entirely by the use of feedback amplifiers. 

121 ^-METHODS 
To describe the ^-method," let 

W a (t) = A ie -i\ + A*—* + ■ ■ ■ + Ae n -.t (1) 

where the a's are prescribed and the A's are to 
be determined. Two considerations are involved 
in the determination of the A's. The first con- 
sideration is based on the relationship between 
the continuity conditions at t = and the ulti- 
mate slope of the loss characteristic as ex- 
pressed in the theorem in Section A.8. Accord- 
ingly, a number of relations of the type 

Ai + A-i + ■ ■ . -f- A n = 
a\ A x + a, At + ... + a„ A„ =0 (2) 

«' A , + al A 2 + . . . + a „ r A n = r < n - 1 

must be satisfied. This leaves n - r - 1 of the 
A's for the second consideration. 

The second consideration concerns the man- 
ner in which the approximation in the range 
t > is to be made. The approximation may, 
for example, be required to pass through 
n - r - 1 points on W(t) or, the first n - r - 1 
moments of the approximation may be required 
to be equal to the corresponding moments of 
W(t). The latter is expressed by relations of 
the type 

Ai A 2 An 1 /* c ° 

-+-+■■■+- = —77, / W(t) /— dt 

s - 1, 2, • • • , n - r - 1 (3) 

Foster's investigations were concerned only 
with the parabolic weighting function (4) 
Chapter 10, so that only the first of (2) was 
involved. Numerical studies led to the belief 
that, with a given number of a's, the best ap- 
proximation was to be had from the case in 

■ The i-method is principally due to R. M. Foster. 





which all of the a's are equal. Hence the natural 
center of attention was the special form 

W a (t) = (Ait + Ad* + • ■ • + An-if -»)«-*. (4) 

At large values of t this expression reduces ap- 
proximately to the last term, and if it is as- 
sumed that A n .i = 1, the settling condition fixes 
a to at least a first approximation. The rest of 
the work of approximating the parabola is then 
equivalent to a problem in polynomial approxi- 
mation. Once the A's are determined, a better 
value of a can be found from the settling con- 
dition, and the process gone through again. 

If the a's are only approximately equal, the 
approximation will still behave approximately 
like (4) with an average value used for a. The 
difficulty with equal or nearly equal a's is that 
it leads to networks with extreme element 
values. In order to secure satisfactory element 
values, it is generally necessary to depart sub- 
stantially from the condition of equal a's. This 
results in some, but not a large, loss of effi- 
ciency in approximating the parabola. Foster 
recommends that the a's be chosen as a geo- 
metric series, with their geometric mean more 
or less around the equivalent point for equal 
a's. With four a's he suggests that the constant 
ratio in the series may be 3:2, whereas with 
only two a's the ratio should be raised to 2:1. 
These are, however, only rough values and 
obviously depend on individual opinion of what 
constitutes an unreasonable element value. 

As a matter of experience, it turns out that 
the characteristic first obtained usually has a 
rather long and slowly decaying tail, as shown 
in Figure 1. This, of course, is equivalent to a 

Figure 1. Approximation to parabolic weighting 
function, showing poor settling characteristic. 

correspondingly long "settling time," or time 
before a useful prediction can be made. In 
practice, therefore, after the preliminary 
design has been found, adjustments are made 
to bring the tail of the curve under control, 

partly by modifying the values of the A's 
slightly, and partly by contracting the time 
scale to bring the part of the tail which remains 
appreciable within the allowable settling time 
limits. This leads to the somewhat lopsided 
match to the parabola shown in Figure 2. 

Figure 2. Approximation to parabolic weighting 
function, showing better settling characteristic. 

A method of bringing the tail of the curve 
under control" is to minimize the expression 


/{W a (t)] 2 d! = 2£ C,„A,A, 


-<.,+« m )r 

ai + am 

under the restrictions (2) and all but the last 

of (3). 

The f-methocj used by Phillips and Weiss is 
based on a 3-term approximation of the form 
(1) in which one a is real while the other two 
may be conjugate complex. The a's are not 
prescribed, so that there are six parameters to 
be determined. Four restrictions are imposed, 
viz., the first of (2), the first of (3), a restric- 
tion on the value of the tail area, viz., 


W.(t)dt = ZAL£_L, 
't '- 1 a t 

and the cross-over condition 

W a (T) = 0. 

Finally, the transmitted noise power, which, 
under the assumption of flat random noise as- 
sociated with the position data, takes the form 
(see Section 10.4) 


[W.(t))t di 

is minimized with respect to the two remaining 
parameters by numerical methods. 

" Used by R. F. Wick. 


— — 






Three p-methods have been used. These will 
be described in chronological order. 

The first p-method is one which was used by 
R. L. Dietzold in exploiting the use of feedback 
amplifiers to secure the advantages of approxi- 
mations with complex exponentials. The trans- 
mission function Y(p) corresponding to the 
best weighting function W(t) is first formu- 
lated. The loss characteristic, -20 log,„ \ Y(im) |, 
is next computed and plotted against the fre- 
quency on a logarithmic scale. Then standard 
equalizer design techniques are employed to ap- 
proximate the loss characteristic, keeping in 
mind that the transmission loss in the feedback 
network of a feedback amplifier becomes a 
transmission gain for the circuit as a whole 

(14) of Chapter 11, we get 

J/o (p) = 

Vi(p) = 

2 + p 


12 + 6p + p» 


The second p-method is merely a more com- 
plete analytic formulation of the first, thereby 
avoiding the necessity for employing equalizer 
design techniques. It depends upon the possi- 
bility of expressing the transmission function 
corresponding to the best weighting function, 
in the form of equation (13) Chapter 11, which 
is associated with the symmetry of the weight- 
ing function, as shown in Section A.7. The 
method is based upon the determination of the 
envelope of the Q-function. The Q-function is 
first differentiated in order to obtain the 
equation which determines the values of « 
at which the maxima and minima occur. This 
transcendental equation is not solved but is 
used to eliminate the trigonometric functions 
in the expression of the Q-function. The result- 
ing expression, which is an irrational function 
of «o 2 , is then squared in order to make it a 
rational function of »>. The substitution 
p* = - o. 2 is made and the expression is then re- 
solved into two factors of which one contains 
all the poles with negative real parts while the 
other contains all the poles with positive real 
parts, the two factors being conjugate complex 
when p = to>. The first factor is then taken as an 
approximation of the desired transmission 
function. Applying the method to the desired 
transmission functions defined by (13) and 

120 + 60p + 12p* + p» • 
This last is the basis for the design of a posi- 
tion and rate smoothing circuit for a proposed 
computor for controlling bombers from the 
ground." 11 This design is described briefly 
in Chapter 13. 

The third p-method is based upon the ascend- 
ing power-series expansion of the transmission 
function corresponding to the best weighting 
function. Examples of such power series are 
given by (15) of Chapter 11. The method of 
approximation is one which is credited to Pade 
in 0. Perron's "Kettenbruchen."" If the discus- 
sion in Section A.8 is referred to, it will be seen 
to be also a method of moments. 

The method consists in determining the co- 
efficients in a rational function of the form 

1 + QiP + Qip» + j- a m p m 

1 + b lP + 6,p» + . . . + 6„p» w 
so that the ascending power-series expansion 
of the rational function will agree with that of 
the best transmission function, term for term 
up to and including p m **. If the series for the 
best transmission function is 

1 + cp + c,p* + . . . + c«+„p»+" + . . . (8) 
the equations which determine the coefficients in 
(7) are obtained by equating coefficients of 
corresponding powers of p, up to and including 
the (m + n)th, in 

(1 + b lV + 


+ fe.p") (l + c,p + • • • 

+c-+.p" + ") 

1 + <HP + • • • + a n p m . 
The last n equations will be homogeneous in 
the 6's and c's. 

It has been expedient in some cases to omit 
the last few of the (m+n) equations in order 
to have some control over the number of real 
roots and poles and the number of conjugate 
pairs of complex roots and poles in the result- 
ing rational function. 

In the assumed rational expression (7) the 




difference n — m "Should be chosen so that the 
ultimate slope of the loss characteristic will be 
the same as for the best transmission function. 
According to the theorem in Section A.8, if 
W(t) behaves like if as t->0, we should take 
n — m = r + 1. As a matter of experience the 
rational expression has invariably turned out 
to be physically realizable whenever this "rule" 
was followed. Frequently, however, the rational 
expression has turned out to be physically 
realizable under small departures from the 

Examples of this method are given in Chap- 
ter 13. 


In this section we shall describe the use of 
feedback amplifiers and servomechanisms to 
obtain desired transmission functions. For com- 
plete discussions of the most recent technical 
advances in the analysis and design of feedback 
amplifiers and servomechanisms the reader 
should consult some of the modern literature 
on these subjects. 2 3 - 51sl61T 

Let us assume that we have two networks 
whose transmission functions are Y t (p) and 
Y 2 (p), respectively, as shown in Figure 3. For 

Y 2 (P) ^>V(t) 

I £ (t) = Y 2 (p)-V(t) 

itic representation of networks 
ick circuit application. 

a signal E(t) applied to the first network the 
short-circuit output current is /,(£) = Y x (p)' 
E(t). For a signal V(t) applied to the second 
network the short-circuit output current is 


Vi 2 

Figure 4. First step in combining networks. 

hit) = 7, (p) -7(0- With the networks sharing 
a common short-circuiting conductor as shown 
in Figure 4, the current through the conductor 
is 7, -I- I 2 . If the source which develops the volt- 

age V(t) across the input terminals of the 
second network were in fact under the control 
of the current through the conductor, as shown 
schematically in Figure 5, in such a manner 

Figure 5. Output voitage controlled by short- 
circuit current across intermediate terminals. 

that it had to develop that voltage V(t) which 
reduces the current in the conductor to zero, 

Yxip) E(t) + Y t (p) ■ V(t) = . 

Hence, the transmission function (now a volt- 
age-voltage ratio) of the arrangement shown 
in Figure 5 must be 


Y(p) = - 


Y,(p) ' 

This relationship provides a method of ob- 
taining transmission functions with complex 
poles without the requirement of coils. The 
complex roots of Y(p), must be assigned to the 
numerator of Y 1 (p) , and the complex poles of 
Y(p) to the numerator of Y t (p). Aside from 
this, the other roots and poles of Y(p) may be 
assigned in any way which is favorable to good 
design practice. Redundant factors may be in- 
troduced if they are desirable, as is done in the 
examples described in Sections 13.1.5 and 13.3. 

The source of the voltage V(t) in Figure 5 
does not' have to be controlled by the current 
through the short-circuiting conductor. Since 
the current through any short circuit must be 
zero if the voltage across the short-circuited 
terminals is zero before the short circuit is con- 
nected across them, the source of the voltage 
V(t) may just as well be controlled by the 
open-circuit voltage, as shown in Figure 6. It 
is clear that the source of the voltage V(t) is 
ideally an infinite gain amplifier. It is not nec- 
essary, however, that the amplifier have ideally 
unilateral transmission and infinite input and 
output impedances, since departures from these 
ideal characteristics may be compensated for in 
the design of the feedback network. 

The simple result expressed by (9) may be 
readily modified to take account of the finite 

This observation was first made by R. L. Dietzold. 




gain of a physical amplifier. The modification 
will be expressed as an extra factor which 
corresponds to the "rf effect" or "nfi error" lie 
commonly encountered in the theory and design 
of feedback amplifiers. 



Figure 6. Output voltage controlled by open- 
circuit voltage across intermediate terminals. 

The exact transmission function of the cir- 
cuit shown in Figure 6 is most simply ex- 
pressed in terms of the following quantities: 
= current through a short across ter- 
minal-pair No. 3, per unit emf applied 
across terminal-pair No. t. 
Y 2 (p) = current through a short across ter- 
minal-pair No. 3, per unit emf applied 
across terminal-pair No. 2. 
Z 2 (p) = impedance between terminal-pair No. 

2, with terminal-pair No. 3 shorted. 
Z 3 (p) = impedance between terminal-pair No. 

3, with amplifier dead, terminal-pair 
No. 1 shorted, and terminal-pair No. 2 

G(p) =transadmittance of amplifier. 

i - 



The quantity GYJZ„Z 3 is the of the circuit. 
The quantity Y,Y,Z„Z 3 to which Y reduces 
when G = represents the direct transmission 
of the circuit. 

The active impedance across terminal-pair 
No. 2 is 



Z tA 

1 — Gi 2Z2Z3 

z iP = z t {\ + r|?,z,) . (12) 

Z tP is the passive impedance across terminal- 
pair No. 2. It differs from Z„ in that terminal- 
pair No. 3 is open. 

The exact expression (10) of the transmis- 
sion function is useful chiefly as a check on the 
simpler but approximate expression (9). It is 
in general quite practicable to make the trans- 
admittance or transconductance G of the am- 
plifier large enough so that the n0 effect may be 

In accordance with the sense in which the 
term "servomechanism" is used by MacColl, 4 
a feedback circuit, such as that shown in Fig- 
ure 6, is a servomechanism — more specifically, 
an electronic servomechanism — since it oper- 
ates on the ideal principle of maintaining zero 
voltage across the terminal-pair No. 3. An 
electromechanical counterpart of the circuit 
shown in Figure 6 is shown in Figure 7. These 


: 7. Electromechanical counterpart of feed-' 
back amplifier circuit resulting in servomechaniMti. 

circuits assume that the signal E(t) is a modu- 
lated d-c carrier. 

If the signal is a modulated a-c carrier, 
"shaping" cannot be done conveniently by elec- 
trical networks. The difficulty may be avoided 
by various special devices. An example is de- 
scribed and illustrated in Section 13.4. 



In this section we will describe and illustrate 
two general methods of designing RC networks. 
The first is most useful when the transmission 
function is finite and not zero at zero fre- 
quency; the second, when the transmission 




function is zero at zero frequency. The case of a 
transmission function with a pole at zero fre- 
quency will not be considered, since it is cov- 
ered by the methods , described in the preceding 
section, in conjunction with the methods de- 
scribed below. 


Op + QiP + ••• + Q.+iP"* 1 

(flo>0) (13) 

1 + 6iP + • ■ • + 6»p" 

with simple, real, negative poles. Dividing by 
p, expanding into partial fractions and multi- 
plying through by p, we get 

On V + «1 P + «» 

\p + Mi P + fit 




where the A's, B's, ats and 0"s are positive real 
quantities. The first term must be associated 
with those in the first parentheses if a n+l > 0, 
with those in the second parentheses if a n+ , < 0. 
The transmission function is now in the form 

Y(P)=YAP)-Y B (P) (14) 

where Y A (p) and Y B (p) are physically real- 
izable driving-point admittances of RC type. 
Each term of the form pA/ (p + a) is the admit- 
tance of the two-terminal, two-element network 

a ..a 

s — wwv — 1| — 

Figure 8. Simple RC network. 

shown in Figure 8. Each term in (14) there- 
fore represents a parallel combination of two- 
element networks of the type shown in Figure 
8 and a conductance a in the case of Y A (p), 



Figure 9. Method of realizing RC transmission 
functions, requiring phase inverter. 

and a capacitance |On n |/b„ in the case of either 
YAP) or Y B (p). By well-known methods these 
two-terminal networks may be transformed 
into a variety of other configurations. 

The transmission function (14) may be real- 
ized in the arrangement shown in Figure 9 
or in that shown in Figure 10. The latter is 
a lattice network which is suitable only in a 


I = (Y A -Y B ).E 

Figure 10. Lattice prototype for passive net- 
works with RC transmission characteristics. 

balanced-to-ground circuit. To obtain an un- 
balanced passive equivalent of this network we 
may resort to steps which will be described 
later in this section. 

The second general method of designing RC 
networks is most useful when 

Y(r>) = r> a ° + a 'P + • ■ + q "P" 
KV) P 1 + b lV + ••• + 6.p- 

(«o > 0) 


with simple, real, negative poles. Now, if the 
lattice in Figure 10 were driven from an in- 
finite-impedance source of current /„, the out- 
put current would be 

1 - 

/ = 


Y h ' 

1 t7~ 

If, furthermore, 








Taking it for granted for the moment that the 
lattice can be transformed as shown schemat- 
ically in Figure 11, we may then discard the 
condenser across the output terminals and, by 
Thevenin's theorem, 1 " we may replace the 
condenser across the input terminals and the 
infinite-impedance current source by a series 
condenser and a zero-impedance voltage source. 
The result is shown in Figure 12. Since 


desk;* of rc networks 


V F. 

I, - pC E we now have 

7 = ( " 


which ia the desired result, to a constant factor. 

The factor k should in general be taken as 
small as possible subject to the requirement 
that all the roots and poles of (16) be simple, 

Figure 11. Step in transformation of networks 
with zero transmission at zero frequency. 

real, and negative. It can always be taken large 
enough to fulfill this requirement. A suitable 
value may be easily chosen by inspection of a 
plot of Y (p) fp for negative real values of p. 

Figure 12. Final step in transformation of net- 
works with zero transmission at zero frequency. 

The numerator and denominator of (16) are 
of equal degree and therefore contain the same 
number of linear factors. These factors may be 
assigned to Y A or to Y B arbitrarily except that 
Y A and Y F must be physically realizable driv- 
ing-point admittance functions which behave 
ultimately like condensers as the frequency in- 
creases indefinitely; that is, roots and poles 
must alternate and there must be a simple pole 
at infinity. 

There are five kinds of steps which may be 
taken to transform a lattice into an unbalanced 
form. These steps are based upon Bartlett's 
bisection theorem, 14 and may be taken in any 
order and as often as necessary. Each of them 
will now be described as it would be applied 
directly to Figure 10. In the following diagrams 
a lattice enclosed in a rectangle means an un- 
balanced network whose configuration may not 
be known yet, but whose lattice prototype is as 

1. Shunt network pulled out of both branches : 
shown in Figure 13. 

2. Shunt network pulled out of the line branch 
only: shown in Figure 14. 

3. Series network pulled out of both branches : 
shown in Figure 15.° 

4. Series network pulled out of the lattice 
branch only : shown in Figure 16. c 

Figure lii. Step in transiormauon oi lattice; 
shunt networks pulled out of both branches. 

Figure 14. Step in transformation of lattice; 
shunt network pulled out of line branch only. 

Figure 15. Step in transformation of lattice; 
series networks pulled out of both branches. 




Figure 16. Step in transformation of lattice; 
series network pulled out of lattice branch only. 

* Given in impedance form. 




5. Breakdown into parallel lattices: a fairly 
obvious step which need not be illustrated. 
As an example of (13) consider 

I(P) l+b lP 
where all the coefficients are positive. Since 

y(p) = P£} -f- a - Oil. ~ ° lbl + ff ») p 

there is no problem if a, > (a,/^) + a^^ But if 
Ox < (aj/6,) + a 6 x we have the problem of trans- 

v — 5 — 

Figure 17. Illustrative lattice prototype. 

forming the lattice in Figure 17. We can apply 
steps 2 and 4 immediately, but find that the 
residual lattice cannot be transformed unless 
a, > {ajb,). Under this additional restriction 
we can apply step 8 obtaining finally the net- 
work shown in Figure 18. 

As an example of (15) consider 

Taking k = 1 (the smallest value which may be 
assigned) , we get 

Yb m 2p(3 + 16p) 

(1 + 2p) (1 + 

One way of choosing Y A and Y B is 

Y (1 + 2p) (1 + 16p) 
A 2(3 + 16p) 

This leads finally to the network shown in Fig- 
ure 19. Such a simple network is possible of 

Y B = p. 

course because F(p) happens to satisfy the re- 
quirements of a physically realizable driving- 
point admittance function. However, another 
way of choosing Y A and Y B is 

Y A 

l_±_2p Y p(3 -I- 16p) 
2 * " 1 + 16p 

This leads to the network shown in Figure 20. 


Figure 18. Unbalanced equivalent of illustrative 
lattice prototype when 02/61 <oi< (a 2 /6i) + 006!. 



— wv\a — 1| — 

= 44 r = — 
1 5 c « 9 

Figure ltf. KC' network with zero transmission at 
aero frequency. 

C =l Ro=2 

R =2 
■AAAAAr 1 

R,= 3 

Figure 20. Another /2C network with zero trans- 
mission at zero frequency. 


Chapter 13 


rpHE ILLUSTRATIVE material described in this 
J- chapter is taken from four practical appli- 

1. Second-derivative circuit for the M9 anti- 
aircraft director. 

2. Position data smoother for the "close sup- 
port plotting board," with delay correction for 
constant velocity aircraft. 

3. Position and rate circuit for the "com- 
puter for controlling bombers from the 
ground," with optional delay correction of posi- 
tion data for constant-velocity aircraft. 

4. Position and rate circuit using electro- 
mechanical servomeeha.'Msms. 

The design and analytical procedure used in 
the first application has not heretofore been 
described in writing. Hence, considerably more 
space will be devoted to it than to the other 
three applications. The latter have been de- 
scribed in detail in reports. 1 " 1; 13 


,, M Realizable Approximation of Best 
Transmission Function 

The best transmission function for the sec- 
ond-derivative circuit was taken to be 

JVp) = p%(p) , 

in the notation of Chapter 11. This assumes fiat 
random noise in position data and, arbitrarily, 
1-second smoothing and settling time. The 
series expansion of y.,(p) is, according to ex- 
pressions (15) of Chapter 11, 

yf( p,-i -Ip + ip.. JLp. + jl-p*...,. 

The form of the rational approximation, 

yip) = 

1 + 6,p + b 2 p* + b 3 p 3 + b<p 4 ' 

was chosen for simplicity under the require- 
ment that the transmission function p*y(p) 

should cut off at the rate of 12 db per octave." 
This requirement was set as a precaution 
against noise due to granularity of the coordi- 
nate-conversion potentiometers in the director. 

Following the procedure outlined in Section 
12.2 the following equations were obtained : 

!>i — 2 = 

b< -\b i + lb t -± b 1 + 1 ^ 

1 h - 3 h 1 
2' J 28' 1 ~ 53 



p* + 21p J + 189p* -(- 882p + 1764 
21 + V21 



- ip» + 

P + 42) 

x rp« + 21 -y^ p + 42) , 


yAv) would have two conjugate pairs of com- 
plex poles, viz., 

p = - 6.40 ± il.047, - 4.10 ± t6.02, 

of which one pair is very nearly real. 

In order to simplify the circuit design, how- 
ever, it was desirable to limit the number of 
complex poles to a single conjugate pair. This 
was accomplished by leaving b 4 arbitrary so 
that the denominator of y 2 (p) was 

1 + 5 p + k p,+ 8l p, + bipt • 
A value for b t which would make this expres- 
sion vanish at two negative real values of p 
was found by plotting 

17646 4 - 5 (*» - Ox* + 42x - 84) 

' The design antedated the formulation of the n — m 
= r + 1 rule given in Section 12.2, according to which 
the best transmission function should have been taken 
as p'y,(p) in the notation of Chapter 11. However, no 
trouble waa experienced in obtaining a physically real- 
izable approximation, of the complexity assumed. 





against x, as shown in Figure 1. The right- 
hand member is positive only in the range 
x > 3.77 and has a maximum of 0.982 at about 
z = 6.63. 




1764 b 4 



1.0 2.0 4.0 6.0 6.0 IO0 

Figure 1. Graphical determination of 6«. 

In order to obtain a substantial separation 
between the two real poles of y 2 (p), the value 
17646, = 0.5 was chosen. The approximation 




has poles at 

p - - 4.17391 , - 31.72813 , - 3.04898 
* t 4.16463 . 
The series expansion of y., (p) agrees with that 
of V t (p) to four terms, the fifth term being 
37/7056 p* instead of 5/1008 p\ The difference 
in the fifth term is less than 6 per cent. 

The realized approximation and the best 
weighting function are shown in Figure 3. 

is.u Transient Responses 

The responses of the physical network whose 
transmission function is p 2 y 2 (p) are compared 
to those of the best network whose transmis- 
sion function is p 2 y 2 (p), in Figures 2, 3, and 4. 
The signals for which (and the formulas by 
which) these responses were computed are 
tabulated below. 

Response formulas 
Realized Best 
L~Hm(p)\ 00/(1 -20(1 -/) 

L~ l \Vdv)\ mu\-t)\* 



/ <0 I £0 






o >f 


/'(10- 15/ + 6/ 1 ) 

It has been noted that Figure 3 also repre- 
sents the best and the realized weighting func- 










1 » 
\ t 

\ « 




\ 1 
\ \ 




V 1M M V HB IM Mm 1 

Figure 2. Responses to step function, viz., E (t) = 
1 when t > 0. 














Figure 3. Responses to linear ramp function, vfz., 
E(t) - t when t > 0; second derivative smoothing 


Figure 4. Responses to parabolic ramp function, 
viz., E(t) = (%)£ when t > 0; second derivative 
settling characteristics. 




If a signal of the form 

Eif) = a t + a J + -., (hfi 

were to be applied suddenly to the second -de- 
rivative circuit at t = the response would be 

r '-; ! (;)-•;•< (?)+*.•<■(?) 

where A,„ A,, A . stand for the responses shown 
in Figures 2, 3, and 4, respectively, and where t 
is the time in seconds and T is the nominal 
smoothing time. The response V(t) is the indi- 
cated acceleration of the target. 

The sudden application of the instantaneous 
position and velocity components of the signal 
to the second-derivative circuit will give rise to 
some very serious consequences unless special 
measures are taken to mitigate them. To see 
this let it be assumed that T = 20 seconds and 
that the target is at such a range that a„ = 
20,000 yards when the signal E (t) is applied 
to the second-derivative circuit. Each unit of 
A in the ordinate scale of Figure 2 then repre- 
sents an indicated acceleration of 50 yd per 
sec-. Referring to Figure 2 it is clear not only 
that the effective settling time will be several 
times the smoothing time but also that the indi- 
cated acceleration will go through exceedingly 
large maxima. 

Exceedingly large transient responses are 
not peculiar to second-derivative circuits. They 
occur also in first-derivative circuits in linear 
prediction, where they are due entirely to the 
initial position term in the signal. In all cases 
they are reduced to harmless proportions by 
special arrangements of the circuits during the 
operation of slewing. 

tion Y s of the experimental second-derivative 
circuit design, also referred to a nominal 
smoothing time of 1 second. The transmission 
function of the linear prediction circuit with 
10-second smoothing of first derivative is then 

:— JTTT 

Table 1* 


. - 


















— 2.014 

3 527 


































— 2.092 





— 4.320 





— 5.777 






















































































• f is in 

c when smoothing time T = 1 

sec. For 

T-second net- 

works. values of 9/ are multiples of 1/9T e, values of Y t should 
bo divided by T, and values of Y t should be divided by T». The 

lwo networks may have different values of 7*. 


Effect of Tracking Errors on while that of the quadratic prediction circuit 

Accuracy of Prediction with 20-second smoothing of second derivative 

The statistical effect of tracking errors on 1S 

the accuracy of prediction is most readily de- 
termined from the power spectrum of the 
tracking errors and the transmission function 
of the prediction circuit. 

Table 1 gives the values of the transmission 
function F, of the first-derivative circuit in the 
M9 director, referred to a nominal smoothing 
time of 1 second,' 1 and the transmission func- 



9494_ K.077 31 74 

1.6 V + 2.4 /. -r :Ui 

27 01 \ 

v + ah) 

Y,(P) - JVp) + 


i G 2 are determined in accordance 
with the discussion in Section A.10. Since 

we get 

)',(p) = p(l - 0.3724p + 
)-,<p) = p 2 (l -•••) 


0', = // 

ft - I </ + 3.7241, . 





Table 2 gives the values of \Yi(p) | J and of 
\Y q (p) \* for t t = 5, 10, 15, 20 seconds. These are 
plotted in Figures 5, 6, 7, and 8. 

of the total power, or an rms error of 15.8 
yards out of 17.9 yards. 

The rms error of prediction is the square 
root of the power transmitted by the prediction 
circuit. This is tabulated on the last line of 
Table 2 and in the smaller table following. 

Figure 5. Power transmission ratio of linear 
and quadratic prediction circuits with 5-second 
prediction time. 

The last column of Table 2 and Figure 9 
give the power spectrum of a composite of the 
range and transverse errors in a typical run 

The power contained in the frequency range 
covered by the table accounts for 78 per cent 


rawt* THANsyiuiON «atio 




-quad nta 






i u 

» II 20 

Figure 6. Power transmission ratio of linear and 
quadratic prediction circuits with 10-second pre- 
diction time. 

Table 2 




|T f f* 

! Y,\* I 










































































































error of 




33 9 






P* Mk-vn 











































































































55.4 125.0 

• P U in uniu of 180 yd" per c 




Time of flight 
in seconds 

Rms error of prediction due 
to tracking errors in yards 
Linear Quadratic 



It is obviously relatively disadvantageous to 
use quadratic prediction when the target is in 
fact flying a rectilinear unaccelerated course. 

Figure 7. Power transmission ratio of linear 
and quadratic prediction circuits with 15-second 
prediction time. 











1 1 i 



J — I 

i r 


1 2o 

Figure 8. Power transmission ratio of linear and 
quadratic prediction circuits with 20-second pre- 
diction time. 

The relative advantage of linear prediction 
should persist for target paths with only a 
slight amount of curvature, but this relative 
advantage should decrease as the curvature is 
increased. When the curvature exceeds a cer- 
tain amount, the relative advantage should 
shift to quadratic prediction. 
The determination of the minimum value of 

target path curvature at which quadratic pre- 
diction becomes relatively advantageous de- 
pends not only upon: 

1. dispersion of the predicted point of im- 
pact due to tracking errors, 
but also upon a number of i 
which are : 

2. actual future position of target with 
respect to the predicted point of impact, assum- 
ing an accurate computer and the absence of all 
sources of dispersion enumerated here ; e 

3. dispersion due to inaccuracies in the com- 
puter and data-transmission systems ; 

4. dispersion due to noise in the computer 
and data-transmission systems ; 

5. dispersion due to variations in actual dead 

6. dispersion due to gun wear and to varia- 
tions in powder charge, shell weight, shell 
shape, etc.; 






s i 


' i 


1 1 r 


" 1 

1 it 1 

* " — fi — =ft — it 

Figure 9. Composite power spectrum of tracking* 
errors of experimental radar. 

7. dispersion due to variations in meteoro- 
logical conditions along the path of the shell ; 

8. dispersion due to variability of time-fuze 
calibration ; and 

9. lethal pattern of shell burst. 

In a special illustrative case, a numerical 
analysis, including most of these factors (esti- 
mated), showed that quadratic prediction be- 
comes relatively advantageous when the target 
acceleration exceeds about O.lg. However, this 
should not be taken as a general result. 

o This is considered in detail in the next section. 




1,1 * Linear and Quadratic Prediction 
Errors on Constant-Velocity 
Circular Courses 

The use of a finite number of derivatives of 
the tracking data for purposes of prediction is 
itself a source of prediction errors even if there 
were no tracking errors. Definite evaluation of 
these prediction errors can be made only if the 
path of the target is prescribed. The simplest 
path which can be prescribed for this purpose 
is a circular one at constant velocity. Such a 
path is fairly realistic when considered in rela- 
tion to the difficulty of maneuvering a bomber 
and to actual records of the paths of hostile 
bombers over London during World War II. 

The position of a target flying in a circle at 
constant velocity, referred to the center of the 
circle, is expressed by the complex quantity 
Re** where R is the radius of the circle and « 
is the angular rate. In terms of the velocity V 
and the transverse acceleration A, we have 
R = V*/A w = A/V. The predicted position is 
then at JtT(i»)e'-' where Y(u.) is the trans- 
mission function of the prediction circuit. The 
true future position of the target, however, is 
at R exp [i«>(t + t,) ]. Hence, the prediction 
error, referred to axes fixed on the target and 
oriented respectively transverse to and in the 
direction of the present velocity, is 

« ~ RlY(iu) - e"r] . 
As an illustration let us consider a case in 
which V = 150 yd per sec, A = 5 yd per sec 1 and 
t f = 10. For the linear prediction circuit 

Yrffo) - 1.0409 + /0.3296 

and for the quadratic prediction circuit 

r,(»«) - 0.9501 + t0.3610 


- 0.9450 + t0.3272 . 

Hence, when the present position of the target 
is at 4500 + t'O with respect to the center of the 
circle, the linear predicted point is at 4684 + 
tl483, the quadratic predicted point is at 
4276 -I- t'1624 while the true future position is 
at 4252 + t'1472. These are shown in Figure 10. 
The prediction error vectors are 

«, = 432 + /ll j t| ; = 432 
« t = 24 + f 152 |«v = 154 

Referring to Figure 10 it may be observed 
that if the first-derivative component of the 
prediction were to be reduced by approximately 
10 per cent a nearly perfect hit would be ob- 
tained. This suggests the possibility of deter- 

2000 - 





(10 SEC) ^ 


— tv LINEAR 










woo - 



1 — 





4M0 m TO 


Figure 10. Vector diagram of linear and quadratic 
prediction for constant-velocity circular courses. 

mining empirical functions of the time of flight 
for the potentiometer factors G, and G, in 
order to improve the probability of kill. This 
would involve consideration of all of the 
sources of dispersion enumerated in the preced- 
ing section as well as a statistical study of tar- 
get paths. Such a determination has not been 

it. i s Physical Configuration of the 
Second-Derivative Circuit 

In this section we shall derive a physical con- 
figuration for the second-derivative circuit. In 
particular it illustrates the application of feed- 
back to the realization of weighting functions 
or impulsive admittances involving complex 
exponentials in general." It should be pointed 
out, however, that the application of feedback 
to the end in view is not restricted to purely 

Originally proposed by R. L. Dietzold. 



electronic circuits. An application involving 
the use of servomechanisms will be described 
in Section 13.4. 

The transmission function which concerns us 
here may be expressed in the partially factored 

Y( P ) = 

((> + 0.2087) i/> + l..)S04)(/;- + 0.3U4<»p + O.OttOli) 
where the |>oles have been adjusted to cor- 
respond to T = 20 seconds and where a constant 
factor has been left out. 

The circuit is to be designed to work out of 
the amplifier in the first-derivative circuit of 
the M9 director. Since this much of the first- 
derivative circuit has a transmission function 
of the form p (p-t-0.24), the transmission 
function which we have to realize is Y ,(p) / 
Y,(l>) where 


P f 0.20S7' ip + i..W»4i 


U.MWp + IMKttWi 
p + 0.24 

The inversion of the factor corresponding to 
Y,(p) is in accordance with the fact that the 
transmission gain through a feedback amplifier 
is equal to the loss in the feedback network, 
provided the feedback is very large. To realize 
the transmission function Y,(p) /Y,(p) it is 
therefore necessary only to realize the trans- 



1 — 1| — WVW^WV- 

»,C,= J.IM 

Ci =, 

R,C, = J. 604 
R,= 0.07UI R, 

= iz.n 



Figure 11. Physical configuration of quadratic 
prediction circuit for modified M9 AA director. 

mission functions Y { (p) and Y,(p) individu- 
ally. The corresponding networks are shown in 
Figure 11, with typical element values. 

The input network has four elements, 
whereas Y, (p) has only two parameters. Hence 
there are two degrees of freedom in the element 
values of this network. One degree of freedom 
must be reserved for the impedance level; the 
other permits some latitude in the relative 
values of the resistances and stiffnesses. 

The feedback network has four independent 
elements, whereas Y,(p) has three parameters. 
Hence there is only one degree of freedom in 
the element values of this network. This degree 
of freedom must be reserved for the impedance 

There is, however, one degree of freedom be- 
tween the impedance levels of the two net- 
works. This follows from the fact that the 
transmission function of the circuit is the ratio 
of the transmission functions of the individual 
networks. The scale factor for the transmission 
function of the circuit is readily determined 
from the fact that the transmission function 
must be approximately pR t ,C„ at small values 
of p. 



In this application, position data smoothing 
with delay correction for constant rates of 
change in position was required. Assuming flat 
random noise in position data, and, arbitrarily, 
1-second smoothing time, the best transmission 
function for position data smoothing without 
delay correction is y u (v) in the notation of 
Section 11.3. The best transmission function 
for the first-derivative circuit, if it were re- 
quired, is py x (p) . Hence, the best transmission 
function for position data smoothing with full 
delay correction is 

= »o(p) + g P* l(p) • 
This corresponds to the weighting function 

Wi(t) = 14,(0 

= 2(2-3/) < / < 1 . 

The series expansion for Y,(p) is, by (15) 
of Chapter 11, 

P 4 


P J + £ _ JL- + 

12 T 30 120 T 




The form of the rational approximation was 
chosen as 

' W 1 .+ b lP + 6 2 p l + b,p* 

in order to obtain a loss characteristic which 
has an ultimate slope of 12 db per octave.* This 
requirement was also set as a precaution 
against noise due to granularity of the coordi- 
nate-conversion potentiometers. The coefficients 
are determined by 



fci = ai 

-n> = ° 

+ ™ 



-V2 b > + 3o fel - lib = ° 


Y(p) = 

1 + Hf + If' + 


This may be expressed in the form Y(p) 
YAp)/Y,(p) where 


7<(p) = 1 -(- 0.1053p 
„ , , 1 + 0.3530p + 0.0461 5p' 

w) - — 

The circuit 
Figure 12. 

1 + 0.4583p 
ion is shown below in 

R./2 "•/* 


R,C, =0.1007 

R, = 0J06IR, 

Figure 12. Physical configuration of data-smooth- 
ing circuit for close support plotting board. 

• This design also antedated the formulation of the 
n — m = r + 1 rule given in Section 12.2 according to 
which we should have taken Yi(p) « y,(p) + % pyAp) ■ 


In this application, rate smoothing as well as 
position smoothing was required. In addition, 
delay correction in position, for constant rate 
of change, was to be available but optional, and 
the loss characteristic was to have an ultimate 
slope of 12 db per octave, or more. 

In accordance with the n — m = r + 1 rule, 
the best transmission function for position data 
is y 1 (p) , whereas that for rate is pi/ : (p) . A num- 
ber of designs were made on this basis. How- 
ever, from the point of view of network econ- 
omy they were inferior to a design based on 
j/ 2 (p) for position data. The use of 2/ 2 (p) for 
position data is not consistent, theoretically, 
with the use of pi/ 2 (p) for rate, but the practi- 
cal advantage outweighs the theoretical disad- 

The rational approximation used for i/,(p) 


MR, 0JR, 
l — WW-r^VWV— 1 



r *. 


R,C, = 0.4431 

r,c, «ai*M 

R,C, -0.S000 
R,C. * HUM 
R,C t « 0.13*0 



Figure 13. Physical configuration of linear pre- 
diction circuit for ground-control bombing com- 

is the one given in (6), Section 12.2. It may 
be expressed as 





1 + 0.2153p 

1 + 0.2847p + 0.03870p» 
1 + 0.135<Jp 


1 + 0.135*)p 





It may be noted that a redundant factor has 
been introduced, viz., 1 + 0.1359p, in order to 
secure a physically realizable Y,(v) . The coeffi- 
cient was chosen so that a resistance would not 
be required in the shunt branch of the feedback 
network. Referring to tin- circuit configura- 
tion in Figure 13, the transmission function of 
the input network is Y, s (p), that of the feed- 
back network is Y,(p), and that of the output 
network at the top is Y, ,(p) . 

The output impedance of the amplifier is re- 
duced nearly to zero by virtue of shunt feed- 
back. 1 "^ Hence, the rate circuit, as shown in 
Figure 13, may be derived from the amplifier 
output through a simple additional network 
whose transmission function is pY,,(p)- Two 
rate outputs are provided so that the delay 
introduced in position may be corrected option- 
ally without disturbing scale factors. 

In the final report, October 25, 1945, to 
NDRC Division 7, on the research program car- 
ried on under Contract NDCrc-178, a list is 
given of a number of the more important prac- 
tical advantages for the use of a-c carrier in 
computing circuits. These advantages are: 

1. Permits operation at lower levels before 
running into trouble with thermal noise, contact 
potentials, drifts due to temperature; 

2. Permits use of transformers for imped- 
ance matching, voltage transformations, cou- 
pling between balanced and unbalanced circuits ; 

3. Permits use of hybrid coils for voltage 
summations of moderate precision ; 

4. Eliminates the necessity for modulators in 
servo circuits using a-c motors ; 

5. Permits reduction in total power consump- 
tion, rectified power for amplifiers, and voltage 

However, the techniques of differentiation 
and of data smoothing with fixed networks in 
computing circuits which use d-c carrier, are 
not applicable to computing circuits which use 
a-c carrier. 

The circuit described here is an example of 
one of the techniques used in the T15-E1 experi- 
mental curved flight director.' In Figure 14 
servo motors' are indicated by A/, and genera- 

' The technique of using servo motor* for smoothing, 
as described above, is due chiefly to h L. Norton. 

tors by G. The motors are two-phase induction 
motors with one phase winding of each ener- 
gized directly by the carrier source at constant 
amplitude. The generators are essentially two- 
phase induction motors also with one phase 
winding of each energized directly by the carrier 
source at constant amplitude. They deliver, at 


14. Electromechanical linear prediction 

the other phase windings, carrier voltage at 
amplitudes proportional to the angular velocities 
0, and 0, of the shafts. The potentiometers are 
energized by the carrier source at constant am- 
plitude. They deliver carrier voltage at ampli- 
tudes proportional to the angular positions 0, 
and 6. 2 of the shafts from some reference posi- 
tions. The position data are represented by the 
modulation amplitude E. 

With amplifiers of sufficiently large voltage 
gain and power capacity, and motors of suffi- 
ciently large torque, the operational equations 
of the circuit are readily found by equating to 
zero the sum of the voltages applied to each 
amplifier. Thus 

0i + (a, + 0p)0, = E 
p0i - (1 + a 2 p)0, = 


0i = 

u 2 = 

1 + a t p 

l + + a„)p -(- 0p J 


1 -Mat + «s)p + /3p J 

The angular position l therefore represents 
the smoothed position data while the angular 
position 6 2 represents the smoothed rate. 


Chapter 14 


The past discussion has been more or less 
clearly directed at predictor systems hav- 
ing certain well-defined properties. For ex- 
ample, it has been tacitly assumed that the first 
part of the prediction system will consist of 
geometrical manipulations transforming the 
raw input data into other quantities, such as 
the components of velocity in Cartesian or in- 
trinsic coordinates, which we have some physi- 
cal reason to believe should be approximately 
constant for extended periods." These quanti- 
ties, then, are isolated explicitly in the circuit 
and are the actual effective inputs of the data- 
smoothing networks. The data-smoothing net- 
works themselves are, of course, definitely 
assumed to be linear and invariable. 

This is obviously a straightforward attack 
but it does not necessarily exhaust all possibili- 
ties. For example, advantages may be gained 
by using data-smoothing networks which are 
nonlinear or which vary with time or target 
position. It may also be possible to smooth the 
input data according to some geometric as- 
sumption, such as straight line flight, without 
the necessity of isolating geometrical parame- 
ters explicitly. 

This chapter attempts to illustrate these pos- 
sibilities by some rather scattered examples. 
Data-smoothing networks which vary with time 
seem to give improved performance over fixed 
networks, and have been studied with some 
care. Several examples are given at the end of 
the chapter. None of the other lines, however, 
has been explored at all thoroughly. The ex- 
amples of data-smoothing networks variable 
with time are, in a sense, illustrations of non- 
linearity also, since they all operate on the 
assumption that the cycle of the network's 
variation with time begins anew at each 
marked change in course. Since a change in 
course is exactly like a tracking error, except 
that it is much larger, this resetting requires 
a nonlinear control circuit which respond 
to large amplitude effects but not to"small ones. 

1 This is true ideally even in the Wiener system since 
Wiener assumes that transformations will be made to 
some suitable coordinate system, preferably the intrin- 
sic, before the statistical prediction method is applied. 

This, however, is evidently a very mild sort of 
nonlinearity. More thoroughgoing nonlineari- 
ties have not been studied. There seems to be 
no a priori reason for supposing that they 
would appreciably improve the performance 
of data-smoothing networks. 

The first part of the chapter gives examples 
of data-smoothing schemes which do not re- 
quire the isolation of geometrical parameters. 
They are based on degenerative feedback cir- 
cuits which satisfy the requisite formal rela- 
tions but which might, in some cases, be un- 
stable in practice. This portion of the material 
is included primarily for its possible sugges- 
tive value rather than for its concrete practical 


The diversity of particular circuits can be 
givon a certain unity by regarding them all as 
modifications of the feedback smoothing cir- 
cuit shown originally in Figure 2 of Chapter 
10. In accordance with the discussion of that 
figure it will be convenient to suppose that the 
resistive feedback path is introduced to limit 
the gain of the amplifier proper, so that the 
structure reduces to an amplifier with high but 
finite gain and a pure capacity feedback. The 
circuit has a net loop gain, and is consequently 
degenerative, at any moderately high frequency. 
For our present purposes, it is convenient to 
recall the general property of degenerative 
feedback amplifiers, that they tend to suppress 
any given frequency by the amount of the de- 
generative feedback for that frequency. This 
suppression obtains not only at the amplifier 
output but at many other points in the circuit 
as well. For example, it holds at the amplifier 
input if we combine the original applied volt- 
age with the voltage contributed by the feed- 
back 1 - circuit 1 ** Thus, except for the absolute 

b This follows immediately from the fact that, since 
the characteristics of the amplifier proper are not 
changed by the addition of the feedback path, the 
output voltage is always a fixed multiple of the net 
input voltage. 





signal level, it is not necessary to transmit 
through the amplifier of Figure 2 of Chapter 
10 in order to produce the smoothing effect. It 
would be sufficient to hang the input circuit of 
the amplifier, as a two-terminal impedance, 
across the circuit. 


The property of degenerative feedback cir- 
cuits which has just been described is con- 
veniently illustrated by a three-dimensional ex- 
tension of the original smoothing circuit of 
Figure 2 cf Chapter 10. The three-dimensional 
circuit is shown in Figure 1. The three input 
voltages are the quantities D, DE, and DA cos 

i 'WW I 

20k win 


r W\rt 







1 m ' 


f m • • mm m mm^ 




Figure 1. Feedback smoothing in three coordinates 

E, where D, E, and A are, respectively, slant 
range, elevation, and azimuth. The three volt- 
ages will be recognized as the three components 
of the target motion in a tilted and rotating 
rectangular coordinate system. One axis of the 
tilted system is directed along the instan- 

taneous line of sight to the target and the other 
two are perpendicular to this one in the ver- 
tical and horizontal planes respectively. It is 
assumed that these input rates represent target 
motion in a straight line, plus the usual track- 
ing errors. The object of the smoothing system 
is to provide shunt impedances which will tend 
to suppress the tracking errors by feedback 
action, according to the principles described in 
the preceding section, without disturbing the 
portions of the input voltages corresponding to 
the assumed straight line path. 

We can simplify the analysis by restricting 
our attention to the special case of two-dimen- 
sional motion which occurs when the target 
course lies in a vertical plane passing directly 
through the antiaircraft position. This is illus- 
trated in Figure 2. In this case the component 
DA cos E is evidently zero. If we represent 
the voltage at the other two terminals, includ- 
ing both the original applied voltages and the 
voltages fed back through the circuit, by V, and 
V v the voltages coming out of the coordinate 
converter on the right-hand side in Figure 2 

v, « Vi cos E -V t sin E 
v w - V t cos E + V x sin E 


These voltages are differentiated, passed 
through a second coordinate converter, and fed 
back so that the output voltages must satisfy 


Vi = D — cos E + it sin E) 
V, = DE - cos E - v, sin E) . 

In order to exhibit the smoothing action of 
the circuit let us denote the observed velocity 
components, referred to the upright and fixed 

This is the coordinate system which was used in the 
experimental T15 director. A complete prediction cir- 
cuit can be obtained by using- the three voltages de- 
scribed here as inputs to the lead servos in the TIB 
system. In the actual T16 system, rates in the tilted 
and rotating coordinate system were obtained by the 
so-called "memory point" method. The voltages D, DE, 
-etc., required with the present method, might be ob- 
tained with the help of tachometers attached to the 
tracking shafts to measure the instantaneous values of 
D, E, and A. An equivalent to the variable smoothing 
of the memory point method can be obtained by *«»n«f 
the gains in the feedback paths in Figure 1 variable 
according to the principles described in a later 




rectangular coordinate system, by u t and u w , 
so that 

u t = D cos E - DE sin E 

u„ = DE cos E + D sin E . 
Substituting (2) and (3) into (1), we get 



Uy — fiVy 


Ml'* + = 

HVy + Vy = Uy . 

These show clearly that v x and v„ are smoothed 
values of u„ and u y , respectively. If n is constant 
the smoothing is of fixed exponential type. If ^ 
is proportional to the time up to some maxi- 
mum value, the smoothing is of the variable 
type described in Sections 14.6 and 14.7. 

To complete the discussion of the circuit we 
observe that by (1) 

Vi — r x cos E + v y sin E 
V t = Vy cos E — r« sin E . 

These show that V x and V, are the smoothed 
rate components referred to the tilted and 
rotating rectangular coordinate system. The 
fact that the orientation of this coordinate sys- 
tem, which depends upon the observed angular 
height E, is not smoothed makes no difference 
to the computation of the leads because this 
computation is made instantaneously in the 
same coordinate system to which the smoothed 
rate components are instantaneously referred. 

The analysis in the general case including 
all three coordinates is of the same nature. 
Since the rate components in fixed rectangular 
coordinates appear in the middle of the feed- 
back path, it is perhaps not fair to regard the 
circuit as an illustration of a data-smoothing 
device which does not rely upon the explicit 
isolation of the geometrical parameters of the 
assumed target path. It should be pointed out, 
however, that in comparison with a straight- 
forward geometrical solution in which velocity 
components in fixed coordinates are first isolated 
explicity, then smoothed, and then used to form 
the basis of prediction, the circuit in Figure 1 
has the advantage that most of the components 
can be built with very low precision. What is 
transmitted around the feedback loop is essen- 

tially the tracking errors only. Since tracking 
errors are always small, very high percentage 
errors in the system can be tolerated.* 




c J 




Figure 2. Feedback smoothing in two coordinates. 


It was mentioned earlier that changing the 
data-smoothing network with the target coor- 
dinates represented one way in which the re- 
sults obtained from fixed networks could be 

d An exception to this statement must be made for 
errors in the coordinate converters which fluctuate 
rapidly with target position. 




generalized. In a sense, the coordinate conver- 
sions of Figure 1 are illustrations of these 
possibilities. A better illustration, howe.dr, is 
provided by the circuit of Figure 3. Thv struc- 

Figure 3 Feedback smoothing with smoothing 

variable v. ; h pv^iioti coordinates. 

ture is intends to give smooth slant range 
rate from slant range lata, under the assump- 
tion of unacceierated straight line target 

The relation between input and output in 
Figure 3 is readily seen to be • 

'"at" -4 '»'•>] 


M ^ (/)IJ + 1= ^ (4) 

where ^ is the amplifier gain, D is slant range, 
and V = dD/dt is slant range rate. 

The principle of the circuit depends upon the 
fact that under the assumed target motion the 
square of the slant range, D 2 , should be a 
quadratic function of time, so that [D (dD/dt)] 
should be a linear function of time and (d/dt) 
[D (dD/dt)] should be a constant. This last is 
the quantity which is fed back in Figure 3. 
If it actually is a constant, it has no further 
influence on the calculation, since the forward 
circuit includes a differentiator, and the opera- 
tion of the circuit is the same as though no 
feedback term were present. This can be verified 
by setting D = D = \/a + 2bt + ct\ corre- 
sponding to ideal straight line flight, in equa- 
tion (4). It is readily seen that the equation is 
satisfied by 

ft + <* dl) 

V = To = 

Va + 2bl -r Ct* 


the first or feedback term being zero. 

If D does not correspond exactly to straight 
line Alight, either because of tracking errors 
or actual target maneuvers, on the other hand, 
the feedback voltage is no longer constant. In 
this case transmission around the loop can 
exist and the degenerative feedback action 
produces smoothing in both the input and the 
output voltage. In calculating the exact effect 
we must take account of the fact that the feed- 
back voltage depends upon the D potentiometer 
in the feedback circuit as well as upon the out- 
put voltage V. Since the D potentiometer set- 
ting must include the errors in the input data, 
this means that the output voltage is not per- 
fectly smoothed, even with unlimited gain 
around the loop. The percentage error in the 
output rate tends in the limit to approximate 
the percentage error in D itself. For practical 
purposes, however, this is a very satisfactory 
result, since in the absence of smoothing per- 
centage errors in rates are usually many times 
those of the corresponding coordinates. 

It is apparent that it should be possible to 
construct many circuits of this general type 
from the differential equations of the trajec- 
tory. A second example is furnished by Figure 
4. The operation of the circuit is essentially 

• • DAcosE 


•The condensers in Figure 3 symbolize differentia- 

Figure 4. Another example of feedback smooth- 
ing with smoothing variable with position coordi- 

similar to that of Figure 3. It depends upon 
the fact that in unaccelerated straight line 
motion the quantity D 2 A cos 2 £ is a constant. 
Instead of multiplying by D 2 and cos 2 £ at a 
single point in the feedback loop, however, 
separate multiplications by D and cos E are 
introduced in the forward and feedback cir- 
cuits. This permits the output to appear as a 
smoothed value of the quantity DA cos E, 



which will be recalled as one of the primary 
quantities in the circuit of Figure 1. 


In addition to making the parameters of the 
data-smoothing network vary as functions of 
the coordinates of target position we may also 
make them variable as functions of time. The 
advantage of variation with time can be under- 
stood by going back to the discussion of the 
analytic arc assumption and its consequences 
for fixed data-smoothing networks, as given in 
Chapters 9, 10, and 11. It will be recalled that 
for any given settling time there was an opti- 
mum choice of the network's weighting func- 
tion. The choice of the settling time itself, how- 
ever, was always a compromise. On the one 
hand, making the settling time too short led 
to too little smoothing, so that the dispersion 
in the resulting fire became excessive. On the 
other hand, too long a settling time meant that 
data from previous unrelated segments were 
retained in the smoothing circuit during too 
large a proportion of an average individual seg- 
ment of the target path, leaving too small a 
residue of the average segment as useful firing 

It is evident that it is theoretically possible 
to escape the consequences of this compromise 
by resorting to variable structures. We need 
merely assume that the network always has a 
weighting function appropriate for a settling 
time equal to the time since the last change in 
course. This would give a small amount of 
smoothing shortly after a change in course, 
with more smoothing and consequently greater 
accuracy later on. No firing time, however, is 
sacrificed waiting for the network to settle. 

In order to exploit these possibilities we 
must, of course, be able to design networks to 
give at least approximately the right sequence 
of weighting function. It is also necessary to 
provide some sort of auxiliary controlling 
mechanism which will sense changes in target 
course and return the variable circuits in the 
smoothing network proper to their initial posi- 
tions. These are both difficult problems which 
.iave been incompletely explored. Some elemen- 
tary solutions, based principally upon modifica- 
tions of the degenerative feedback smoothing 

circuit of Figure 2, of Chapter 10, are, how- 
ever, given later in the chapter. As a prelimi- 
nary, the next section gives a formal extension 
of the general polynomial expansion method of 
Chapter 11 to the variable case. 


The extension of the general method of 
Chapter 11 to the variable case requires two 

1. The lower limit of the integral to be 
minimized is now taken as zero, in anticipation 
of the possibility of discriminating between rele- 
vant and irrelevant data on the basis of time of 

2. The weighting function may now depend 
more generally upon the variable of integration 
and the upper limit of integration. 

With these modifications there is no longer 
any advantage, in conducting the analysis in 
terms of the age variable t. To deal directly 
with the minimization of the integral 

jf \E(\) - ig(X)}« B'o(/,X) rfX , (5) 


E(\) = Vo + Vi- G,«,X) + • • • + V m • G n (t,\), (6) 

Where G m (t,k) is an mth degree polynomial in 
A. Also, let 

£ w (t,\) d\ = i 

jf G,(/,X) ■ G m (t,\) ■ W (t,\) d\ = if I * m 

" T. in = m 

(Go = 1, Ar = 1) . 

Then (5) is a minimum with respect to the 
V m 's in (6) if 

V m (t) =J^ l E(\)-W m (t,\)d\ (7) 


W m (i,\) = k m G m (t,\) • W (t,\) . (8) 

The possibility of physically realizing the 
V m (t) depends upon the possibility of realizing 
networks with impulsive admittances W m (t^) 
in the sense that W m {t,k) is the response of a 




network, at time t, to a unit impulse applied at 
time A, where < A < t. Taking this possibility 
for granted, the predicted value E(t + t,) is, 
according to (6), a variable linear 
of the V m {t), viz., 

Kit + t/) 


Wit) + d(M + ii) ■ V v (i) + ■ 

+ G n (t,t + y • v.(t). 

It is clear that all of the W m (t,\) as well as 
all of the G m (t,\) for m = 1, 2, . . . are deter- 
mined by W (t,\). The latter is determined as 
the best weighting function for position data 
smoothing, depending upon the characteristics 
of the noise associated with the position data. 
The general methods of determining the best 
weighting function with fixed smoothing time, 
described in Chapter 10, may be used to deter- 
mine the best weighting function with variable 
smoothing time. 

Under the assumption that the spectrum of 
the noise associated with the signal 5(0 has a 
uniform slope of 6k do per octave, we may take 
over from Section 11.3 the result that the best 
weighting function is 

-«- J W ! [i( l <-W (,0) 

£ X £ I . 
The response of the network is then 


S(X) • w k {t,\) rfX 



It will be illuminating to consider a few 
special cases of (11). 
For k = 0, we have 

V(D = | jfs(X)dX. 


Multiplying through by t and differentiating 
we get 

tV(t) + V(t) = 5(0 . (13) 

This suggests the circuit shown in Figure 5. f 
For k = 1, we have 


t* Jo 

S(X) • \(t - X) rfX . 

Multiplying through by t 3 and differentiating 
twice we get 

Irv + IV + V = S 
which may be written in the form 

This suggests the network shown in Figure 6.« 



By generalizing the above results in various 
ways a large number of other examples of 
variable smoothing networks can be constructed. 
Since unlimited variation in the smoothing 
time is not practically possible, or perhaps even 
tactically optimal, however, it is desirable in 
discussing any further examples to include also 
the possibility that the range of variation in 
the network may be restricted. For any posi- 
tive integral value of k in (11) the differential 
equation for V(t) is of the type which may be 
reduced by the transformation t = e* to a linear 
differential equation with constant coefficients. 11 
In general, this facilitates the determination of 
what happens to the weighting function 
w k (t,A) when t > T if the variability of the 
network is stopped at time T. In the case of the 
first-order equation (13), however, it is just 
as easy to deal directly in terms of the natural 

A more general form for (13), which readily 
yields the effects of a sudden or gradual stop- 
page of the variability of the network, is 


V(t) + V(t) = 5(0 


This corresponds to the response 
whence the weighting function is 

w(t,\) = 




' This circuit is due to S. Darlington. 

« Due to B. T. Weber. 

"See Section A.ll for a more, general transforma- 




The general relation (14) may be realized 
with the network of Figure 5, by varying the 
resistance in accordance with 

R m 1<K0 

t > . 

However, a more practical circuit results from 
the introduction of variable potentiometers' in 
both the capacity and resistance paths of the 

C=4= V(t) 

Figure 5. Time-variable smoothing circuit giving 
uniform weighting function. 

original feedback smoothing circuit of Figure 
2, Chapter 10. This is shown in Figure 7.' It 
may be noted that the feedback circuit is also 
applicable to the two cases discussed in the 
preceding section. It has the advantage for 
these applications that it does not require the 
zero-impedance generators and infinite-imped- 
ance loads of Figures 5 and 6. 

This example obviously calls for a linear poten- 
tiometer in the condenser path and a switch in 
the resistance path. The weighting function ob- 
tained is, by (15), 

u>(*,"X) - - < \ < t < T 

j, e-^/r o < X < T < t 
1 e-«-wr < T < X < t 

Figure 7. Limited range time-variable feedback 
smoothing circuit. 

S(1)A C, 

D ,J_ 

C,=J= V(t) 


Figure €. Time-variable smoothing circuit giv- 
ing parabolic weighting function. 

As an example of (14) we may take 

*(0 = t < t < T 
= re"-™ t > T . 


J(0 =/ 0<t<T 
= T t > T . 
Hence, in Figure 7, if RC = T 

fc(t) = j, fa(t) =0 <t < T 
= 1 = 1 t > T . 

1 In aome cases a variable potentiometer may turn 
out to be a switch. 

J This circuit is due to S. Darlington. 

This is illustrated in Figure 8 for T= 10, t = 5, 
10, 20. 



t = 5 
t = IO 



10 15 20 

Figure 8. First example of weighting function 
produced by circuit of Figure 7. 

A second example is furnished by taking 

<t>(t) = i k < t < T 
= 7*e*«-T>/T t > T . 



k < 1 < T 





Hence in Figure 7, if RC T k. 

The weighting function obtained is, by (15), 

frit) = T fud) = 1 l k (i < i . T 

= 1 

1 i > T 

wCt,\) = 


The first example is a special case of this one. 
The weighting function obtained is, by (15), 

AX* -1 

u»(/,x) = — -j— o < x < / < r 

■ c -* ( '- r)/r o < x < t < / 

= ^ e -*('-M/r o < T < X < / . 

This is illustrated in Figure 9 for k - 3/2, 
7 1 - 10, t = 5, 10, 20. 

< X < * < 7 

27 1 

7 xV e " 2l '" T) T < x < T < 1 

V ~2f) 

e -2 ( i-y)/T < T < \ < t . 

This is illustrated in Figure 10 for T = 10, 
t = 5, 10, 20. 

k = i T=I0 

Figure 9. Second example of weighting function 
produced by circuit of Figure 7. 

A third example is furnished by taking 


< / < T 

TV *«-T) r , > 7' 

Figure 10. Third example of weighting function 
produced by circuit of Figure 7. 

A fourth example is furnished by taking 

4><t) - c* - 1 < > . 



57, i>o. 

Hence, in Figure 7, if f?C = 1/k, 

fc(t) = /*(0 = 1 - e~ kt t> . 
The weighting function obtained is, by (15), 


w(t,\) = 

1 - e 


e -*d-x) o < X < t 

<t>a) \ 2/7 

For any value of t this weighting function is 
exponential in x. 




Hence, in Figure 7, if RC - 7/2, 
/r(fl = |(l ^) /*(» = -,{. < / < T 

= 1 = 1 / > T . 


Because there has been no demand for varia- 
ble networks in the field of communications, 
the technique of designing practical variable 
networks is in a very rudimentary stage com- 
pared to that of designing fixed networks. In 
the remainder of this chapter we shall describe 



some of the circuits which have been developed 
for specific practical applications. 

A memory point method of obtaining 
smoothed rates, based upon (12), is illustrated 
below. If S(t), the quantity to be smoothed, 
lepresents the time derivative E(t) of the posi- 
tion data E(t), then the average rate is given 

Coder the assumption that the position data, 
aside from tracking errors, is a linear function 
of time, the average rate is also the smoothed 
rate. If the position data is represented by the 
angular displacement of a shaft in the com- 
puter, the quantity £"(0) is readily fixed by 
providing a second shaft which is coupled to 
the first shaft until t - when the coupling is 
broken. Potentiometers mounted on the shafts 
are energized by a voltage varying as a func- 
tion of time in the manner indicated in Figure 
11. The manner in which the smoothed rate is 
obtained is clear 

Fibi'iit 11. Memory point method of obtaining 
smoothed rate. 

The memory point method of obtaining 

iuothed rates is used in the T15 antiaircraft 
director. 4 In this application, however, it is 
somewhat more complicated than in the simple 
illustration described above. This is due to the 
fact that the position data and the memory 
point are in the polar coordinate system, 
whereas the rate components are referred to 
a tilted and rotating rectangular coordinate 
system which is determined by the instanta- 
neous llllr of sight 

Figure 12, shows a way of securing variable 
smoothing in a purely electrical circuit * Except 
for the fact that the division of the current 
through the condensers is varied discontinu- 

FiGURE 12. Specific limited range time-variable 
feedback smoothing circuit. 

ously instead of continuously, this circuit cor- 
responds to the first or the second example dis- 
cussed in Section 14.7. 

Figure 13 shows the variable smoothing cir- 
cuit 1 for smoothing first derivatives in the 
M9A1-E1 antiaircraft director. 8 This circuit 


Figure IS. Another specific limited range time- 
variable feedback smoothing circuit. 

corresponds approximately to the second exam- 
ple of the differential equation (14) given 
above. The variable element is a thermistor 
which is heated up to a high temperature, prac- 
tically instantaneously, by the heater, and then 

k This circuit is due to S. Darlington. 
1 Developed by R. F. Wick. 




allowed to cool off naturally. By choosing the 
electrical and thermal constants in the circuit 
correctly the resulting smoothing can be made 
to approximate that obtained in a memory 
point circuit. 

As noted earlier, all these variable circuits 
require some auxiliary control means to reset 
the variable circuits to zero whenever a new 
target is engaged or the current target makes 
a sudden change in course. In the T15 memory 
point system this function was performed by an 
operator. The operator was aided by a series of 
meters which compared the instantaneous 
memory point rates with average rates set in 
some time previously by hand. The visual in- 
dication of a change in course, calling for the 
selection of a new memory point, was a rela- 
tively large, smoothly and decisively varying 
deflection on the meters. In contrast, normal 
tracking errors appeared as relatively small 
random fluctuations of the needles. The circuits 
of Figures 7 and 12, which were intended for 
bombsight applications, were also under the 
control of an operator, who was supposed to 
start the mechanism at the beginning of each 
bombing run. 

Two control methods were used for the cir- 
cuit of Figure 13. In one, large changes in rate, 
corresponding to probable changes in target 

course, were distinguished by comparing the 
instantaneous value of the target rate, as ob- 
tained directly from a differentiator, with the 
smoothed value obtained at the output of the 
smoothing circuit. In the other method, equiva- 
lent information was obtained by again differ- 
entiating the instantaneous value of the target 
rate, making a second derivative of the target 
coordinate. In either case this rate difference 
or second derivative information was used to 
control a gas tube, which went off, supplying 
heating current to the variable thermistor, 
whenever the voltage applied to it exceeded a 
certain threshold. This threshold evidently 
marks the minimum change in course for which 
the variable network will be reset. In order to 
permit the use of a low threshold, without 
making the circuit unduly liable to false opera- 
tion because of the effect of tracking errors, 
the gas tube input voltage was first transmitted 
through a low-pass filter which suppressed 
most of the energy due to tracking errors. A 
considerable amount of work was done on the 
proportioning of this filter to provide the best 
protection against false operation with a low 
threshold and with minimum delay in resetting 
in case a change of course actually does occur, 
but the problem remains an interesting subject 
for research. 



THIS APPENDIX GIVES a summary of linear 
network theory which is pertinent to the 
analysis and design of data-smoothing and 
prediction circuits. It is incomplete in many 
respects and should therefore be supplemented 
by reference to established textbooks on the 
subject. However, it contains some results 
which are new. 

The present summary will be concerned 
mainly with fixed linear networks. Variable 
linear networks will be considered briefly in 
the last section. 


A fixed linear transmission network is one in 
which the response V(t) is related to the im- 
pressed signal E(t) by a linear differential 
equation of the form 

b 'dW +bn - i dJiy^ + + M ' 

d m E d m ' l E 

with constant coefficients. It is well-known that 
the solutions of such a differential equation 
obey the "superposition principle." This makes 
it possible to formulate the response of the net- 
work to any signal, in terms of its response to 
certain standard signals. 

A convenient standard signal for analytical 
purposes is the "unit impulse." It may be re- 
garded as the limit of the rectangular pulse 
shown in Figure 1 as the duration of the pulse 

» i 1 

Figure 1. Rectangular puise signal. 

is decreased indefinitely while the amplitude is 
increased in such a way that the area under 
the pulse is always unity. The limiting function 
thus denned does not exist in a strict mathe- 
matical sense. However, it is very convenient 
for analytical purposes, and seldom leads to 
difficulties, to proceed as though the limiting 
function did exist. An impulse occurring at 

t = a is conventionally denoted by the singular 
function S u (t — A) where 

«o(t) = if r ^ 
J h a (r)dr =0 if t < 
si if t> 

The response of a fixed network to an im- 
pulse or any form of signal is independent of 
the time at which the signal is applied, provided 
it is expressed as a function of the time relative 
to the application of the signal. Let W(t) be 
the response to the signal & (t). This is called 
the "impulsive admittance" of the network. 
Physically, it must be identically zero for nega- 
tive values of t. For an impulse applied at t = A 
the response will therefore be W(t — A), which 
is identically zero for t < A. 

A physical signal E(t) such as the one shown 
in Figure 2 may be resolved into an infinite 

Figure 2. Derivation of superposition theorem. 

succession of elementary impulses. The strength 
of the typical elementary impulsive component, 
such as the one shown in Figure 2 as occurring 
at time A, is E(\)d\. Its contribution to the 
response at time t is E(\)-W(t — A) dk. Hence 
the contribution of all the elementary impulsive 
components of the signal, to the response at 
time t, is given by the formula" 

V{t) = f + E{\) ■ W(t - A)d\ (2) 

This is one form of the "superposition theo- 
rem" for fixed linear networks. 

Before discussing the reasons for the limits 
of integration indicated in (2), it will be help- 
ful to consider a graphical interpretation other 
than the one used in deriving the integral. Let 
W(t) be of the form shown in Figure 3, and let 
^(A) be of the form shown in Figure 4. To 
determine the response V(t) at a given value 
of t, the curve in Figure 3 is turned over from 





right to left and placed over the curve in Fig- 
ure 4 so that its right-hand edge is at A - t. The 
product of the two curves gives a third curve 
(not shown), which is identically zero for all 
. > t. The area under the third curve is the re- 

I — L-W(t) 

FlGl'RE 3. An 

impulsive admittance 

sponse V(t) at the given value of t. For pro- 
gressively larger values of t, the curve repre- 
senting W(t — a) in Figure 4 is simply slid to 
the right with respect to the curve represent- 
ing E (a) . 


-i C I 1 ? 3 

f'ieu* 4. Graphical iiiterpif iaUon 
turn theoiem 

ismee a physical signal must certainly be 
identically zero up to some definite time, or 
since it must certainly have been applied to the 
network at some definite time, that time could 
be taken arbitrarily as Zero and (2) could be 
written in the form 

V® = f 



In this form, however, since 




is in general a function of t, the response cou.d 
not Oe interpreted as a weighted average of the 
signal. On the other hand, since 

j ^ H',/ - Ax/A = jT W\r)d7 

is independent of t, the response may be inter- 
preted as a weighted average of the signal, if 

•/, - 1 

1 h: 

-ce.->sity of taking tiie lower limit in f2i 
j in order t" permit the interpretation 
of the response as a weighted average of the 

signal, is also expressed by the pi»iu1 of view 
that a hxed network cannot make any ,/n/sical 
distinction between having no applud signal 
and having an applied signal which happens to 
be of zero amplitude. 

Another shortcoming of the form i'Ai or, for 
that matter, of the form (2) if we set t as the 
upper limit of integration, comes from the con- 
sideration of impulsive admittances of such a 
nature that Wit - A) has certain kinds of sin- 
gularities at a — t. For example, the case for 
direct transmission, expressed in the form 


/; > 

(A* • S (t - A),7A 

is ambiguous because the singularity in the 
integrand occurs exactly at one end of the 
range of integration. However, the form 


A I • bn't — Av/A 

leads, without ambiguity, to the result 
V (t) -- E(f) . This example is not trivia!. Every 
network which transmits infinite frequency 
must have an impulsive admittance of such a 
nature that WU \) contains a singularity of 
the I'm n, &,.(' a). Any attempt to rule out such 
a singularity on the ground that physical net- 
works cannot in fact transmit infinite fre- 
quency, complicates the analysis and design of 
networks unduly. If a network is capable of, 
or is expected to transmit frequencies at the 
top of the range of interest or importance, it is 
simpler to assume that the network is capable 
of, or is expected to transmit all frequencies 
above that range. 

One other advantage of taking the limit 

s of 

integration as indicated in (2) may be called 
to attention Keeping in mind that /-.'(a) is 
identically zero for all values of A below some 
definite though perhaps unknown value, and 
that Wit ai is identically , t ro for all values 
of a t, it is viear that (2) may be integrated 
partially any number of times without incur- 
ring the burden of carrying a string of iff ins 
outside of the integral. Af?«r one pamai inte- 
gration we have 



.1 ;/ 

Sine £ i a, ..< identic. ! :> . ],„ ai . ,.,:„,.. f 
.-. in vM-.n-h Eix) > : ienti«all> zer. ... it d *inee 



A(t - A) is identically zero for all values of 
A > t, a second partial integration may be per- 
formed with no more formal complication than 
the first partial integration. The fact of the 
matter is that the terms which ordinarily arise 
in partial integrations, outside of the integral, 
are here carried under the integral by singulari- 
ties of the integrand. 

The superposition theorem in the i^rm (4) 
may be derived directly in a manner similar to 
the derivation of (2). A(t - i) is the response 
of the network to a Heav; ..e unit step func- 
tion H(t — a) applied at t A, where 

H(1 - X) m when t < X 

= 1 when t > A . 

The signal is resolved into an infinite succes- 
sion of elementary step functions of amplitude 
E'{k)dk wherever E(k) is continuous, and 
finite step functions of amplitude dE(k) wher- 
ever £"(a) has a finite discontinuity. The con- 
tribution of each elementary step function to the 
response at time t is E' (k) A(t — k)dk, that 
of each finite step function is A (t - A) • dE(k). 
Hence, the response is given formally by (4) 
with the understanding that E'(k)dk is to be 
interpreted as dE(k) wherever E(k) is discon- 

The response A (t) of the network to a 
Heaviside unit step function H(t) applied at 
t — is called the "indicial admittance" of the 
network. It is more familiar, in the field of 
linear transmission theory, than the impulsive 
admittance to which it is related by (5), but in 
this monograph preference is given to the use 
of the impulsive admittance. In the theory of 
linear differential equations the impulsive ad- 
mittance is known as a Green's function. 

It is often convenient to express the response 
so that the variable of integration represents 
the age of the elementary components of the 
signal. Introducing the age variable 

r = t- A (0) 

into (2), we have 

F(0 = £*FAt-T) ■ W(r)dr. (7) 

•Formula (4) may be written in the Stieltjes form 
V(t)= I A(t-\)aE(\). 

Alternatively, we may take the point of view that 
E'(A) contains impulsive singularities wherever E(\) 
is discontinuous. This point of view is generalized in 
Appendix B. 

In this form it is clear that the weighting of 
signal components is on the basis of age only. 
A fixed network may be said to have a memory 
which is a function only of the age of past 

In the preliminary stages of designing a 
smoothing network, the weighting function 
W( T ) is generally prescribed to be identically 
zero when t > T say, as well as when t < 0. 
This does not violate the conditions of physical 
readability. However, such a weighting func- 
tion cannot be obtained exactly with a network 
of a finite number of discrete impedance ele- 
ments. A finite network invariably yields a 
weighting function with a "tail" which extends 
to infinity. 


Theoretically, the impulsive admittance of a 
prescribed network may be determined directly 
from the differential equations of the network 
in a perfectly straightforward manner. Prac- 
tically, however, it is very difficult to do so if 
the network has more than two meshes. Fur- 
thermore, the technical problem of designing 
a network directly from a prescribed impulsive 
admittance is even more difficult, particularly 
if the impulsive admittance is not exactly re- 

These difficulties may be avoided by recourse 
to the highly developed methods of network 
analysis and synthesis used in the field of com- 
munication circuits. These methods are based 
upon the steady-state properties of networks. 

If a signal consisting of the single sinusoid 
cos <i>£ is applied to an invariable or fixed 
linear transmission network, the steady-state re- 
sponse" will also be a single sinusoid of the 
same frequency. The amplitude and phase of 
the response, relative to the signal, will in 
general depend upon the frequency. The re- 
sponse may be regarded as the resultant of an 
"inphase component" proportional to cos o>£, 
and a "quadrature component" proportional to 
sin U, with amplitude coefficients which are 
functions of the frequency. Furthermore, since 
the signal is an even function of the frequency, 
the response should also be an even function 
of the frequency. Hence, the response will 

" This is the response apart from transient compo- 
nents, assuming that the latter vanish exponentially 
with time after the signal is impressed. 

c The signal is also an even function of the time but 
this is due only to the particular choice of origin which 
is arbitrary. 




be of the form G(w 2 ) cos wt — wH(w 2 ) sin wt, 
where G and H are even real functions of fre- 

By a suitable shift of the origin of time it 
follows that if the impressed signal is sin wt, 
the steady-state response will be of the form 
G(w 2 ) sin^f + o)H(oj') cos wt. 

These two results may be combined into a 
simpler expression without any loss of indi- 
viduali ty. Since e iu>t - cos wt + i sin wt where 
i = \/ — 1, we have 

V(t) = '[<?(»*) -(- iuH(u')} ■ if E(l) = e". 

A further simplification may be achieved by re- 
placing iw by p, and G( - p 2 ) + pH{- p 2 ) by 
Y{p), so that 

V(f) = Yip) ■ e" if E{t) = e* . (8) 

Y (p) is called the "steady-state transmission 
function" or just "transmission function" for 

Strictly speaking, (8) expresses the relation 
of steady-state response to signal only if p = u>. 
However, it is customarily called a steady-state 
relation even when p is not a pure imaginary 
quantity. It may be noted that Y(p) is real 
when p is real. 

The simplicity of steady-state analysis de- 
rives from the fact that time occurs in the 
signal and throughout the network only in the 
form e pt . In particular, the determination of 
the transmission function is reduced to the 
solution of simultaneous algebraic equations 
which do not involve the time factor. For a net- 
work in which the signal and the response are 
related by the linear differential equation (1) 
with constant coefficients, we obtain simply 

KV 6o + 6,p + • • ■ + f>„p B ' 

It may be noted that the poles of the transmis- 
sion function, also referred to as "infinite-gain 
points" in the p-plane, correspond to the roots 
of the characteristic function of the differential 
equation. Physical restrictions on the location 
of infinite-gain points will be considered in Sec- 
tion A.9. 



A relationship between the impulsive admit- 
tance and the transmission function of a net- 

work may be obtained from (7). Putting 
E(t) = e" when t > 0, we get 

V(t) = ePt J^' w ( T ^ e'* 1 dT 
= e"jT W(t) e~* dr 

W(t) e-» dr 


The second term in (9) is a transient term due 
to the fact that we have taken E{t) ==0 when 
t < 0. The first term in (9), which involves the 
time only through e"', is the steady-state term. 
Comparing this term with (8) we get 


W(t) e~" dt 


or, in the notation which will be introduced in 
the next section 


Y(p) = L[W{t)\ . 



The frequent use which is made of the 
Laplace transform and its inverse, in the 
analysis and design of fixed linear networks, 
warrants a brief discussion of these trans- 

Given a function f(t) which is identically 
zero when t < 0, its Laplace transform g (p) is 
defined by the formula 

g(p) = Hf(t)] 

f(t) e-" dt 


This is usually written with for the lower 
limit, but by having the point t = inside the 
range of integration, instead of at the end, we 
secure the same advantages for (12) that we 
gained in the case of (2) by having the point 
k = t inside the range of integration. Since f(t) 
is identically zero when K0 we could write 
— oo for the lower limit in (12) , but this would 
run the risk of confusion with the so-called 
"bilateral Laplace transform." On the whole, 
it is worth while to have a constant reminder 
that functions f(t) which are not identically 
zero when t < are ruled out. 

The integral in (12) is usually not con- 
vergent for all values of p. That is, in order to 
secure convergence of the integral, it may be 
necessary to assume R(p) >a, where R(p) is 
the real part of p, and a is a real number. The 




result of the integration is a representation of 
g(p) in the half-plane R(p) > a. Since the 
representation is analytic throughout the half- 
plane, the principle of analytic continuation 
allows us to extend the definition of g(p) to 
the remainder of the /;-plane. 

Given a function g{p) which is analytic 
throughout the half-plane R(p) > c where c is 
a real number, its inverse Laplace transform 
/(f) is given by the formula 

f{t) = L-'[ff(p)] 

] fc+ia 

<j{p) €*< dp (13) 

provided /(f) is identically zero when t < 0. 
If the result of the integration in (13) is not 
identically zero when t < 0, g(p) is not a 
Laplace transform and the application of the 
inverse transformation to it is meaningless. 

Translation Theorem 

A useful theorem can be established at this 
point. This is the translation theorem. 

G{p) = L[F(t)~\ 


L->[G(p)e ^ = F(t - a) 

provided that F (f — a) =s when t < 0. Trans- 
lation is to the right or left according as a is 

— ™ 

positive or negative. 

If it happens that F(f)==0 when t < t 
where f > 0, then the restriction is that 
a> — t . That is, a limited amount of transla- 
tion to the left is permissible. In general, f = 
and the restriction is therefore that a > 0. This 
theorem follows readily from (12) or (13). 

In all of the applications of (13) which we 
have any occasion to make in the analysis and 
design of fixed linear networks, the function 
g(p) may be resolved into a sum of terms of 
the form G(p)e- pa where a > and G(p) is a 
rational algebraic function with real coeffi- 
cients. Making use of the translation theorem, 
the problem of evaluating L 1 [g (p) ] reduces to 
that of evaluating L-'[G(p)]. Now, G(p) may 
be resolved into a sum of terms of the form 
p" or l/(p — a) m+1 where m = 0, 1, 2 - ••. We 
shall consider these two cases separately. 

The case G (p) = p" will be treated by means 
of (12) and some limiting processes. In Sec- 
tion A.l the unit impulse was regarded as the 
limit of a rectangular pulse of duration T and 
amplitude 1/7. By means of (12) the Laplace 

transform of such a 
< f < T is 

over the interval 

1 - tr* 


L [£,(()] = lim 1 - e-> T _ 

T-*0 p f - 1 • 

Formally therefore 

L-> [1] = 1,(0 (14) 

Similarly, the Laplace transform of a pulse 
over the interval a < t < a + T where a > is 

1 -c-" r 


L[6 (t-a)} 

lim 1 - e-" r 

Formally therefore 

L-i [e-~] = & (t~a) . 

The last result follows directly from (14) using 
the translation theorem. 
Next, let 

r-*o ji 

This is the limiting case, as shown in Figure 5, 
of two impulses of strengths 1/T and -1/T 
separated by a time interval T. It may be called 



V -i p Ct-T/T 

Figure 5. An impulse doublet. 

an impulse of second order. By (12) and the 
previous results 

L [1,(0] - Km 1 -«-"', - 
r-»o f v • 

Formally therefore 

L~ l [p] - «,«) . 


Proceeding in this fashion we may define an 
impulse of (m + l)th order as 

Ut) = lim <— .«) - «— i (t-T ) 






and we may then show that 

MM')] = r. 

Formally therefore 

L~ l [jr] « a.(0 



This disposes of the case G(p) = p m where 
m — 0, 1, 2 • • • . 

The case G(p) = 1/ (p - a) "* l will be treated 
by means of (13) and Jordan's lemma. 

Jordan's Lemma 

If all the singularities of G(p) can be en- 
closed by a circle of finite radius with center at 
the origin, and if G (p) -*0 uniformly with 
respect to arg z as \z\ -> oo, then 

G(p)e*dp] - 

where r is a semicircle oi radius P , with center 
at the origin, to the right of the imaginary axis 
if t is negative, to the left of the imaginary axis 
if t is positive. 

By the use of this lemma the contour of inte- 
gration in (13) may be closed and the integra- 
tion may then be performed by the method of 
residues. In the case 



(p - a)-+ l 

we readily obtain 

where m — 0, 1, 2 

[(p - a)-+>] 

t < 



/ > 0. 

An important special case of (18), correspond- 
ing to o = 0, is 

J Lp" +1 J m! 

< > 


Another useful theorem which is readily 
established by means of (12) and (13) is 
Borel's theorem. 

Borel's Theorem 

If 0(P), 9Av), 9ii.P) are the Laplace trans- 
forms of f(t) t /,(«), /,(*), respectively, and if 

g(p) - 0i(p) 0t(p) 

m - " x) /,(x)dx 

- £jx{T)-S*{t-r)dr. 

The functions /, (O and f t (t) are subject to 
conditions which permit the inversion of the 
order of integration in the following proof. 
However, these conditions are seldom of any 
concern. We have 

ftfl = L -l {0i(p) • L [/»(*)]} 

Inverting the order of integration and noting 

2x1 Jc-i<r> 

gi(p)t p(, ~ x) dp 

if X > t 
f(t - X) if X < < 
we obtain the result stated in the theorem. 


The result (8) obtained in Section A.2 sug- 
gests an operational expression of the form 

V® = Y(p) ■ E® (20) 

for the response-to-signal relationship what- 
ever the signal E{t) might be. If the equiva- 
lence of this operational expression to (2) it 
taken as a matter of definition we may readily 
discover the nature of the implied operation. 

In the light of Borel's theorem, (2) may be 
expressed in the form 

L[V(t)} = L\W(»] • L\EW] 

under the permissible assumption that £(t)«0 
when t < 0. Hence 

V(#) = lr x [LflPOl ■ L{E(t))\ 

or, by (11) 

V(0 = L~ l \ Y(p) ■ L[E(t)]\ . (21) 

This is, therefore, in general the meaning of 
the operational expression (20) . 4 

o We note that if S(p) = L\E(t)\, the operational 

V(t) ~ S(p) ■ W{t) 
U equivalent to (20). Thii form ia need in Section 104 
and in Appendix B. 




The symmetry of the impulsive admittance 
is expressed by 

W(T - t) = W(t) 

Since W(t) =0 when t < 0, it must be so also 
when t > T. Hence 

' W{t)e~*dt + / W(t)e~*dt. 

By a change of variable of integration the sec- 
ond term may be expressed in the form 

W(T -t)e-* T -»dt 

Assume that W(t) admits the series expan- 

Wit) = a + A,t + ... +4;r + ••• • < 25) 

771 , 


or, because of the sj 

W(Qe* dt . 

Hence, if the first term in Y(p) be 

W(t)e-* dt 

we have 

Y(p) = Yy(p) + Yi{-p)er+* 

= [i r i(p)e pT/2 + Ki(-p)e- pT/2 ] tr* Tn . 

At real frequencies (p = u>) the bracketed fac- 
tor is evidently an even real function of 


• e- u * r/I . 


Apart from discontinuities in the phase angle 
of the transmission function at real frequencies 
» for which QU 2 ) is zero, the phase angle is 
proportional to frequency. Such a transmission 
function is referred to as a linear phase trans- 
mission function. Sinusoidal components of the 
signal, of frequencies less than the lowest fre- 
quency at which Q (<u J ) vanishes, suffer phase 
retardations in transmission in proportion to 
their frequencies. These components therefore 
contribute no delay distortion. They are delayed 
by a uniform amount, just as they are in a 
properly terminated distortionless, uniform 
transmission line, although in the case of (24) 
they contribute amplitude or loss distortion 
through Qiw 2 ). The delay in (24) is just half 
of the "smoothing time" T. 


Two useful series relationships between im- 
pulsive admittances and transmission functions 
will be derived in this section. 

for small positive values of t. Then by (11) 
and (19) 


pi 1 ' pmH 

If A the transmission cannot drop off 
faster than 6 db per octave as the frequency 
increases indefinitely. If the transmission is to 
drop off ultimately at the rate of 6fc db per 
octave all of the A's up to and including A k . 2 
must be zero. This is to say that the impulsive 
admittance and all of its derivatives of orders 
up to and including the (k — 2)th must vanish 
at * = 0. 

Next, let us suppose that the impulsive ad- 
mittance and all of its derivatives of orders up 
to and including the (k — 2)th are continuous 
through all values of t including t — except 
that the (k — 2)th derivative is discontinuous 
only at t = a. We may resolve the impulsive 
admittance into the sum W,(t) + W 2 (t) where 
W 1 (t) and all of its derivatives of orders up to 
and including the . (fc — 2)th are continuous 
through all values of t including t = 0, while 
W 2 (t) =0 for all values of t < a. Then, for 
small positive values oft — a 

A k .i (t - a)*"' 


(k - 

(A k . t * 0) 


Hence the transmission cannot drop off ulti- 
mately faster than 6(k — 1) db per octave. We 
may summarize these results in the asymptotic 
loss theorem. 

Asymptotic Loss Theorem. 

If the transmission is to drop off ultimately 
at the rate of 6A; db per octave as the frequency 
increases indefinitely, the impulsive admittance 
and all of its derivatives of orders up to and 
including the (k — 2)th must be continuous 
through all values of t including t = 0. 

Discontinuities in W(t) or in some deriva- 
tive of W(t) cannot occur except at t = in 
the case of physical lumped element networks. 
Practically, however, rapid changes in W(t) 




or in some derivative of W(t), at any value of 
t, may be expected to be associated with much 
the same behavior of the transmission at rea- 
sonably high frequencies. As an example con- 
sider the case 

W{t) = e-- -e-v (0 > a > 0). 
- a 


(p + + 

W(t) is continuous through t — as long as 
is finite but becomes discontinuous there in the 
limit as fi-* ». The first derivative of W(t) 
is discontinuous through t = even when is 
finite. The ultimate slope of the transmission is 
12 db per octave, in accordance with the 
asymptotic loss theorem, but in the range 
a < w < p the transmission appears to have a 
slope of only 6 db per octave. 

The importance of the observations made in 
the preceding paragraph, in the design of a 
network, is that if we attempt to approximate 
a W(t) which has a discontinuity in a deriva- 
tive of lower order at t = a than at t = 0, the 
fact that the physical approximation must have 
continuous derivatives of all orders and through 
all values of t except t - is not very signifi- 
cant. The ultimate slope of the transmission 
may not be reached until the frequency is too 
high to be of any importance. 

Another useful relationship between impul- 
sive admittance and transmission function fol- 


The transmission function Y(p) of a lumped 
element network is a rational algebraic func- 
tion of p. It is real for real values of p (A.2) . 
Hence, the coefficients must be real, and there- 
fore the roots and poles must either be real or 
occur in conjugate complex pairs. 

Such a function may be expanded into the 
sum of a polynomial and a rational function 
whose numerator is of lower degree than the 
denominator. The latter may therefore be prop- 
erly expanded into partial fractions. For a 
partial fraction of the form 

— L_ *he re)B =l, 2 ... 
(p — a)" 

the contribution to the impulsive admittance 
W(t) is by (18) 

I; 1 ~- 1 = , » « > 0) . 

L(p - a)"J (m - 1)! 

For a pair of partial fractions of the form 

A + iR A - iB 

(p - a + iff)" + (p - a - iff)m 

the contril 


to the impulsive admittance is 

C (A cos fit + B sin pi) . 

(m - 1)! 

Since the impulsive admittance is the re- 
sponse to an impulsive signal it is clear that for 
/"» a stable network the impulsive admittance must 

lows from the assumption that / t-W (t) dt be free of terms which increase indefinitely 

with time, either on account of an amplitude 

is finite for m = 
exponential in 

1, 2 ... If we expand the 

F(p) = / \\'itu-*,tt 
into a power series in pt we get 

F(P) - M, - M , p + _ 





rW(t)di . 



The quantity M m is the mth moment of the im- 
pulsive admittance. 

When M„ = 1 we speak of the response of the 
network as a weighted average of the impressed 
signal, and speak of the impulsive admittance 
W(t) as the weighting function. 

factor of the form e at where a > 0, or; in the 
event that a = 0, on account of an amplitude fac- 
tor of the form fr"- 1 where m > 1. Hence, the 
physical restrictions on the transmission func- 
tion are: 

1. No poles with positive real parts. 

2. Poles on the imaginary p axis must be 

The poles of a passive transmission function 
correspond to modes of free motion. lsh Each of 
them may be shown lM to satisfy an equation of 
the form 

pT + F + - = o 

where T, F, V are positive quantities whose 
values depend upon the particular mode and 

• Poles on the imaginary p axis must also be ruled 
out on the ground that persistent transients cannot be 
tolerated any more than growir 




its activity. However, T is zero in the absence 
of kinetic energy, F is zero in the absence of 
energy dissipation, and V is zero in the absence 
of potential energy. It follows that in the 
absence of coils or in the absence of condensers, 
the transmission function must have poles only 
on the negative real p axis. 

For extremely narrow-band, low-pass appli- 
cations, such as data smoothing, it is not prac- 
ticable to build networks which call for coils 
because these generally turn out to be of many 
thousands of henries in inductance. The exclu- 
sion of coils from these applications does not, 
however, rule out transmission functions with 
complex poles. These may be realized with RC 
networks in feedback amplifier circuits as is 
shown in Chapter 12. 


A quasi-distortionless transmission network 
is one which is distortionless only in a certain 
sense. This sense will be made clear in this 



1 + dip + o 2 p 2 + ■ ■ • +a m p m 

1 + hp + 6 2 p 2 + . . . + bnjj* 


This may also be written in the form 

Y{p) - 1 + c lP + 

C -^+... + C I^ +p r + lg(p)m 

Obviously g (p) will be a rational function with 
the same denominator as Y(p) and a numera- 
tor of (*n-l)th degree. If we now apply a sig- 
nal of the form 

E{t) = 

= r 

for t < 
for i > 

the response, by (21), will be 

V(t) « F + rcT* + ^7=2), cS-'+.-.+c, 

+ rl L- 1 [g(p)} «>0). 

If the coefficients in the rational expression for 
Y(p) are such that 

ci = t/, c 2 = //,•■• c r = fj 



V(t) = (t + t,)> + r! L-i [g(p)} (t > 0). (32) 

The second term vanishes exponentially with 
time. The first term is an advanced or a re- 
tarded facsimile of the applied signal accord- 

ing to whether t, is positive or negative. We 
shall say that Y(p) is the transmission func- 
tion of a network which is quasi-distortionless 
to the signal t r . 

Obviously a transmission network which is 
quasi-distortionless to the signal f must also be 
quasi-distortionless to every signal f where s 
is a positive integer less than r, including zero. 
Hence we may state the quasi-distortionless 
transmission theorem. 

Quasi-Distortionless Transmission 

If the signal 

E{t) = for t < 

= polynomial of degree r at most in / for 
t > 

is applied to a "quasi-distortionless transmis- 
sion network of order r," the response will be 
of the form 

I'm = E{t + i f ) + {)(<■-<) for / > o, 

where O(e ') stands for terms which vanish 
exponentially with time. 

If t, > the transmission network is a pre- 
dictor for polynomials of degree r at most. 
However, it does not begin to predict properly 
until some time has elapsed after the start of 
the signal, or of a new analytic segment of the 
signal; that is, until the transients have sub- 
sided sufficiently. 

If t { — the transmission network may be 
regarded as a delay-corrected smoother for 
polynomials of degree r at most. This is ob- 
tained simply by taking 

ai = bi, n 2 = b 2 , ■■■ a T = b T 


in (29), 

A. 11 


A variable linear transmission network is 
one in which the response V(t) is related to the 
impressed signal £(0 by the linear differential 
equation (1) with coefficients which are pre- 
scribed functions of t. The solutions of such a 
differential equation also obey the superposi- 
tion principle. Thus it is possible in this case 
also to formulate the response of the network 
to any signal in terms of its response to a 
standard impulsive signal. 

The response of a variable network to an 
impulse or any form of signal depends, how- 




ever, on the time at which the signal is applied. 
For an impulsive signal applied at time \ the 
response at time t will be represented by 
W(t,x). This is still called the "impulsive ad- 
mittance." In the theory of linear differential 
equations it is known as a Green's function. 
Physically, it must be identically zero for 

The superposition theorem may now be writ- 
ten in the form 

V(t) = jT + E(\) ■ W(t,\) d\ (34) 

provided the network has been properly de- 
signed and set into operation at t — 0. If 

W(t,\) dX = 1 

for all values of t > 0, the response may be 
interpreted as a weighted average of the sig- 
nal. We note that in order to interpret the 
response as a weighted average of the signal, 
it is now no longer necessary to take the lower 
limit in (34) as — oo, as it was in the case of 
(2) for a fixed network. In other words, a 
variable network can be designed and set into 
operation at any time so that components of 
the signal which arrive before that time are 
completely ignored. 

The analysis and design of variable linear 
networks are in general much more difficult 

than those of fixed linear networks. This is due 
largely to the fact that there does not yet exist 
a technique corresponding to the steady-state 
and operational methods used in connection 
with fixed networks. However, there is a class 
of variable networks whose analysis and design 
are greatly facilitated by the fact that they are 
related to fixed networks by a transformation 
of the time variable. 

Consider the linear differential equation 

. d"V d n ~ l V , . dV , Tr „ 

with constant coefficients. With appropriate 
restrictions on the roots of the characteristic 

6nX n + fc.-xX"- 1 + ••• +bi\ + 1 

it represents the response-to-signal relation- 
ship in a fixed network, if z is proportional 
directly to time. However, if z is a more gen- 
eral function of the time, it will correspond to 
a variable network. The kind of transformation 
which is desired here is one which transforms 
the range - oo < z < + tx into the range 
< t < + oo with a one-to-one correspondence. 
Thus, we may take z = log 6(t) where 6 (t) is a 
positive monotonic increasing function of t in 
the range < t < + oo, with <li£ 6(t) = 0. Sev- 
eral examples of 6(t), including 0(t) = t, are 
considered in detail in Chapter 14. 





BEST smoothing or weighting functions have 
been determined in Chapters 10 and 11 
under the assumption of random noise with fiat 
spectrum. It has not been worth while in prac- 
tice to base the choice of best weighting func- 
tions on any more elaborate considerations of 
actual noise spectra, for at least three reasons : 

1. The effectiveness of a smoothing network 
shape of the weighting function. 

2. Noise spectra are subject to variations, 
due to factors which it is not desirable in prac- 
tice to attempt to control. 

3. Elaborate smoothing functions require 
elaborate networks with close tolerances on ele- 
ment values. 

Nevertheless, the theory of smoothing pre- 
sented in this monograph would not be com- 
plete without showing how more general shapes 
of noise spectra can be considered. Two meth- 
ods are presented here, which are generaliza- 
tions of those presented in Sections 10.3 and 
10.4, respectively. 


Let g(t) be the tracking error, and W (t) the 
impulsive admittance of a smoothing and pre- 
diction circuit with smoothing time T. Then 
the error in prediction due to tracking error 
only, is 

m = f Q T Q{t - r) • W(t) dr. 

The impulsive admittance W(r) will depend 
also upon the time of flight which, for purposes 
of analysis, is assumed to be constant. The 
mean square error is then 

V2 = - lim kjl L Y ^ di 

Jo So 

W( Tl ) • C(n - T| ) • WWdtidtt 





g(\) ■ g(\ + x) d\ • (1) 

C(x) is the autocorrelation of the error time- 
function g (A) . 

For an nth order smoothing and prediction 
circuit V 2 is now minimized with respect to the 
impulsive admittance under the restrictions* 


T"W(r)dT = C-</)" (w = 0. 1. 2 ••• n). (2) 

Hence W(r) must satisfy the integral equa* 

jj C(t - r) • W(r)dr = * + *i< + • ■ • + U" 

(0 <. 1 <. T) 

where the k m are constants to be determined. 
Now, if 

i C(t - t) • W. m (r)dT = V" (0 <• t <. T) 

(to = 0, 1, 2 - n) (3) 


W(t) = hWoir) + hWi(r) + ••• + KW n (r). (4) 

The procedure is then to determine C(x) from 
(1), the W m (r) from (3), the k m from (2) and 
(4), and finally W( T ) from (4). It may be 
noted that, in general, every k m will be a poly- 
nominal of nth degree in t f . Hence the W m (r) 
appearing here are not the same as those de- 
fined in Chapter 11, although W(t) should be 
the same if the same W (t) is used in Chapter 

A difficulty of the theory given above is in 
the solution of the integral equations (3) . This 
difficulty is avoided in the theory given in the 
next section. However, the integral equations 
are easily solved in case of flat random noise, 
when C(z) is simply an impulse of strength K 
say, at x = 0. Then 


< t < T. 

Since the strength is irrelevant, it may be taken 
equal to T so that W ( T ) will be normalized. 

'These follow from the discussions in Sections A.8 
«J A.10, especially equations (27), (28), (30), and 





For a linear prediction circuit it is then found 

W(r) = 2 (2 + %)w (r) - ! ( 1 + I ) Wr(r). 

Putting T = 1 this may be expressed as 
W(t) « Wo(t) + G,(- t f )voiM (t) 

in terms of the G.( T ) and W m ir) of Section 


The theory of Phillips and Weiss offers the 
most direct proof that the best smoothing or 
weighting function must be symmetrical, re- 
gardless of the noise power spectrum. The 
situation is that of minimizing (1) under only 
one of the restrictions (2), viz., the normaliz- 
ing condition 

J r W(r)dr - 1 (5) 

The weighting function is therefore deter- 
mined, up to a constant scale factor, by the 
condition that 

jf C it - t) • W(r)dr « k, (6) 

where k is a constant. Substituting T — t for t 
and T — t for t, we have 

/C(t - • W(T - r)dr « k. (7) 

Since C( - x) = C(x), and since W(r) is de- 
termined uniquely by (6) and (5), it follows 
from (6) and (7) that 

W(T - t) = W(t). (8) 


The noise power transmitted through a net- 
work may be expressed in the familiar form 

p = / N( w ») • |r(t W )|»d« 

where N(u>*) is the noise power spectrum and 
Yip) is the transmission function of the net- 
work. Assuming that N(a>*) is a rational func- 
tion of »*, which is finite at all finite values of 
w including zero, it is possible to determine a 

rational function S(p), which has no poles on 
or to the right of the imaginary axis in the 
p-plane with the exception of the point at infin- 
ity, and such that 

|S(tw)|2 = A T (fc>2). 

It may be readily shown that 

r-'£v<f>Y* (0) 

where F(t) is related to the impulsive admit- 
tance W(t) by the operational equation 

F(t) = S(p) ■ Wit) (10) 

The problem is now to minimize (9) under the 

^ / Wit)di = 1 when <o > 1. (ll) 


Qip) - (P + «i) (p + 01) • • • (p + «-) 
Hip) - (P + A) (p + A) ••• (p + A) 

and ft is of no consequence. One or more of the 
a's, but none of the pa may be zero. Since the 
existence of the integral in (9) imposes the 
requirement that Fit) have no discontinuities 
of higher type than finite jumps in the range 
- < t < 00, the continuity conditions on W(t) 
in (10) must depend upon the difference be- 
tween m and n in the expressions for Q (p) and 

If m > n, it is fairly obvious that Wit) must 
be differentiate, in the ordinary sense, exactly 
m — n times. In other words, Wit) and all its 
derivatives up to and including the (m — n 
— l)th must be continuous, but the (m - w)th 
derivative may have finite jumps. If m < n we 
must consider the introduction into Wit) of 
discontinuities of higher type than finite jumps. 
These discontinuities arise in the formal ex- 
tension of the concept of differentiation to 
functions containing finite jumps. 

If a function 4 it) has a finite jump of am- 
plitude A at t = a, the value of 4,' it) at that 
point will be indicated formally as A • S (t — a) 
where S it — a) is a unit impulse at t = a. If 
*'(a + 0) - *'(a - 0) = A„ the value of 4," it) 
at t = a will be indicated formally as A . 
it - a) + A, • 8„« - a) where $,(« - a) is a 




unit doublet at t = a. And so on, for higher de- 
rivatives of $(<). 

The expression (9) is a minimum under the 
restriction (11) if Wit) satisfies the differ- 
ential equation 

Qip) -Q(-P) W(t) = const. (12) 

when < t < 1 and Y (p) the condition 

1 /**" 

2^ / S(P) -S(-P) • y (p)e*dp - const, 
when < t < 1. (13) 
The restriction (11)' itself requires that 
TP(t) =0 when t > 1, and 

TT(<)<& = 1. (14) 


Case I. (n = 0) 

The general solution of (12) contains 2m + 1 
constants of integration which are determined 
by (14) and the 2m continuity conditions that 
Wit) and all of its derivatives up to and in- 
cluding the (m - l)th must vanish at t = and 
t = I. 

Case II. (n # 0, m > n) 

The general solution of (12) contains 2m + 1 
constants of integration which are reduced 
to 2n in number by (14) and the 2(m - n) 
continuity conditions that Wit) and all of its 
derivatives up to and including the (m — n — 
l)th must vanish at t = and at t = 1. The 
remaining 2n constants are determined by (IS) . 

The left-hand member of (13) may be for- 
mulated by the method of residues. The ex- 
pression for Yip) should first be separated 
into two parts so that 

Yip) - Y L (P) + Y K (p)e-> 

where Y L (p) and Y K (p) are rational functions 
of S(p) S(-p) .Y L (p)e» in the left-hand 
in the left-hand half of the p-plane for the first 
part of Y (p) , and in the right-hand half for the 
second part. Hence, if the sum of the residues 
of S(p) - S(— p) - Y L (p)e» in the left-hand 
half of the p-plane be donated by S t . and if the 
sum of the residues of Sip) • S(—p) • Y M (p) ■ 
e»(t-i) i n the right-hand half of the p-plane be 
denoted by X K > then the condition (13) re- 
duces to 

2t - - const. (15) 

Case III. (n ^ 0, m < n) 

The 2m + 1 constants of integration in the 
general solution of (12) are first increased to 
2n + 1 by appending the 2 (n - m) singularities 

kit), «i(0, 1(0 

«o(< - 1), Slit - 1), ••■ — i H ~ 1) 

and then reduced to 2n by (14) . The remainder 
are determined by (13) or (15). 
In formulating 


it may be noted that 
£,[«„(< - a)] = 

Example of Case I 


(a £ 0) . 

Let S(p) = p". The differential equation (12) 
requires Wit) to be a polynomial of degree 2m. 
The conditions at t = require it to have a 
factor t m , and those at t = 1, a factor (1 — t) m . 
This leaves only (14) to be satisfied. Hence 

Wit) - (2t ^, 1)! [*(i - 01- (0 <; t Z 1) 

in agreement with (8) of Section 10.8. 

Example of Case II 


p + a 

P + 


Then, by 

W(t) - A + A ie -« + A,f (0 < < £ 1) 

Y( p ) . — + — — — -l 


p + a p — a 

_ pL- + dip + A-q e -, 

|_p p + a p-aj 

2, = 

Condition (15) is satisfied if 





where Example of Case III 

Q « °" - 0i r . Let S(p) = 1/1 + fi. Then, by (12) and the 

sinh ^ + cosh rule for appending singularities in Case III 

Hence W(t) = A + AMO + A t 6 (t - 1) (0 £ 1). 



In the limit as o-»0, S(p) - - _ j^T + — ^ — e ~ 

and 2* = - ^° ~ eK'-D . 

W(t) « =-±-2 (0 <: < £ 1) . Condition (15) is satisfied if 

1 + 1 &i A 

f 62 + A\ m At — 

In terms of expressions (12), Section 11.3. 


W(t) = Wt(t \ ± k ™ l(t) (0 il£l) , + + 6o(t - 1) 

where k = 1/6 [£'/ (2 + £)]. This is reminis- w ,q m f (0 £ f £ 1) 

cent of Stibitz's results mentioned in Section 2 

10.3. 1 + -J 





1. The Extrapolation, Interpolation and Smoothing of 
Stationary Time Series with Engineering Applica- 
tion*, Norbert Wiener, OSRD 870, Report to the 
Services 19, Research Project DIC-6037, The Mas- 
sachusetts Institute of Technology, Feb. 1, 1942. 

Div. 7-318.1-M2 

la. Ibid., Chapter 1. 

2. The AnalytiM and Design of Servomechanisms, 
Herbert Harris, Jr., OSRD 454, Progress Report to 
the Services 23, The Massachusetts Institute of 
Technology. Div. 7-321.1-M7 

8. Behavior and Detign of Servomeehanitmt, Gordon 
S. Brown, OSRD 89, Progress Report 2, The Mas- 
sachusetts Institute of Technology, November 1940. 

Div. 7-821.1-M1 

4. Antiaircraft Director T-15, OEMsr-358, Report to 
the Services 62, Western Electric Company, Inc., 
August 1948. Div. 7-112.2-M6 

5. The Analytit and Synthetic of Linear Servomecha- 
nicmc, Albert C. Hall, OSRD 2097, Report to the 
Services 64, The Massachusetts Institute of Tech- 
nology, May 1948. Div. 7-821.1-MS 

6. Antiaircraft Director, T-lS-El, E. L. Norton, 
OEMsr-858, Report to the Services 98, Bell Tele- 
phone Laboratories, Inc., July 80, 1945. 

Div. 7-112.2-M11 

7. Theoretical Calculation on Bett Smoothing of Poti- 
tion Data for Gunnery Prediction, R. S. Phillips 
and P. R. Weiss, OEMsr-262, AMP Note 11, Re- 
port 532, The Massachusetts Institute of Tech- 
nology, Radiation Laboratory, Feb. 16, 1944. 

Div. 14-244.4-M'l 

8. A Long Range, High- Angle Electrical Antiaircraft 
Director [Final Report on T-10], C. A. Lovell, 
NDCrc-127, Research Project 2, Division 7 Report 
to the Services 80, Bell Telephone Laboratories, 
Inc., June 24, 1944. Div. 7-112.2-M9 

9. Flight Records of Pitch, Roll, and Yaw, taken in 
a variety of bombers at Wright Field, Ohio, Sperry 
Gyroscope Company, 1942-5. 

10. Detign and Performance of Data-Smoothing Net- 
work, R. B. Blackman, OEMsr-262, Report MM-44- 
110-38, [Bell Telephone Laboratories, Inc.], July 8, 

11. Computer for Controlling Bombers from the 
Ground, E. Lakatos and H. G. Och, OEMsr-262, 
July 24, 1944. 

12. A Position and Rate Smoothing Circuit for Ground- 
Controlled Bombing Computers, R. B. Blackman, 
OEMsr-262, Report MM-44-110-79, [Bell Telephone 
Laboratories, Inc.], Aug. 21, 1944. 

13. A Two-Servo Circuit for Smoothing Present Posi- 
tion Coordinates and Rate in Antiaircraft Gun 
Directors, R. B. Blackman, Contract W-30-069- 
ORD-1448, Report MM-44-110-65, [Bell Telephone 
Laboratories, Inc.], Sept. 27, 1944. 

14. The Theory of Electrical Artificial Lines and Fil- 
ters, A. C. Bartlett, John Wiley and Sons, Inc., 
1931, p. 28. 

15. Network Analysis and Feedback Amplifier Design, 
H. W. Bode, D. Van Nostrand Company, 1945. 

15a. Ibid., Chapters 7, 8, 18, and 14 

15b. Ibid., p. 813. 

15c. Ibid., p. 326. 

15d. Ibid., p. 801. 

15e. Ibid., p. 38. 

15f. Ibid., p. 12. 

15g. Ibid., p. 78. 

15h. Ibid., p. 110. 

15i. Ibid., p. 133. 

15 j. Ibid., Chapter 6. 

16. Fundamental Theory of Servo-mechanisms, L. A. 
MacColl, D. Van Nostrand Company, 1945. 

17. Automatic Control Engineering, E. S. Smith, Mc- 
Graw-Hill Book Company, Inc., 1944. 

18. Die Lehre von den Kettenbrucken, B. G. Teubner, 
Leipzig, 1918. 

19. "Transient Oscillations in Wave Filters," J. R. 
Carson and O. J. Zobel, Bell System Technical 
Journal, July 1923. 

20. "Harmonic Analysis of Irregular Motion," Nor- 
bert Wiener, Journal of Mathematics and Physics, 
Vol. 5, 1926, pp. 99-189. 

21. "Generalized Harmonic Analysis," Norbert Wie- 
ner, Acta Mathematica, Stockholm, Vol. 55, 1930, 
pp. 117-258. 

22. "Stochastic Problems in Physics and Astronomy," 
S. Chandrasekhar, Review of Modern Physics, Vol. 
15, 1943, pp. 1-89. 

28. "Mathematical Analysis of Random Noise," S. O. 
Rice, Bell System Technical Journal, Vol. 23, 1944, 
pp. 282-832. 

23a. Ibid., Vol. 24, 1945, pp. 46-156. 

«S 1S07S 



Cover Sheet for technical memoranda 
Research Department 

subject: The Transient Behavior of a Large Number of Four- 
v -' Terminal Unilateral Linear Networks Connected in 

Tandem - Case 20876 


1 - H.W.BW.B*F.-H.F#-Case Files mm- 46-110-49 

2 — case files ° ATE April 10, 1946 

3- L.G.Abraham-T.E. Brewer authors C.L* Dolph 

4- C.H.Elmendorf-H.K.Krist idotbqkxoex C.E. Shannon 
s - H.S.Black-F.B. Anderson Index No. W1.416 

e- G»N*Thayer-C.W.Harrison 
7 - R.L.Dietzold 
a - L.A*MaoColl ' 1 
9 - B.M.01iver 

10- C.L^Dolph 

11- C.E.Shannon 


Asymptotic expressions for the transient 
response of a long chain of four-terminal unilateral 
linear networks connected in tandem subject to an 
initial disturbance are developed and classified accord- 
ing to the characteristics of the common transfer ratio. 
It is shown that a necessary and sufficient condition 
for the stability of the chain for all n is that the 
transfer ratio be of the high pass type. 

The mathematical results are applied to 
chains of self-regulating telephone repeaters. 

The Transient Behavior of a Large Number of Four-Terminal 
Unilateral Linear Networks Connected in Tandem - Case £0878 

MM-4 6- 110-49 
April 10, 1946 



The transient response behavior of a long chain of 
invariable four-terminal networks connected .unilaterally in 
tandem is of primary importance in the design of cross-country 
wire communication systems, since the successful operation of 
such equipment depends upon the rapid damping of transients 
caused by suddenly applied inputs. 

While the emchasis in the memorandum will be directed 
toward coaxial systems cons'is-fcing of self-regulating ^repeaters 
spaced at 3-7 mile intervals and spanning distant points, the 
results are of a more general nature and would apply, with 
obvious modifications and corresponding interpretations, to any 
configuration involving a large number of four-terminal linear 
invariable networks connected unilaterally in tandem. 

It will be shown that there are two fundamentally 
different types of transient, response possible depending upon 
the gain characteristic of the transfer ratio of the individual 
four-terminal linear networks comprising the system. The first 
type of response while satisfactory is difficult to achieve in 
practice because of the stringent requirements on the gain 
characteristic of the transfer ratio. The second, a case often 
encountered in practice, will be shown to be unsatisfactory in 
general since it leads to build-up and overloading in any 
physical system comprising a large number of such networks. 
However, a guiding design orinciple will be suggested which, 
it is believed, will enable us to minimize the worst of the 
effects, and make the successful operation of a system of the 
type envisaged here possible. 

This memorandum is divided into two parts. In the 
first the problem is defined physically and then formulated 
mathematically. Following this, the history of the problem is 
discussed briefly after which the new results are summarized.- 

Finally, this part concludes with a discussion of their inter- 
pretation and implications for the coaxial system. The second 
part presents the detailed mathematical arguments which led to 
the new results of part one. 


Statement of the Problem 

The analysis in this memorandum is directed toward 
the understanding of certain anomalous effects which a long 
chain of self-regulating telephone repeaters may exhibit at its 
output when the input end of the chain is subject to a transient 
disturbance (Cf. Figure 1). 

The gain settings of the repeaters in such a chain 
are usually controlled by the level of a pilot frequency some- 
where in the communication band and the regulation is designed 
to compensate for low frequency phenomena (up to approximately 
one cycle per second) such as the diurnal Change in line resis- 
tance. The repeaters in the chain are normally absolutely 
stable devices so that any transient which is presented to the 
input of any one of them will be evanescent in time at the 
output of that repeater. 

Since transients are not damped out instantaneously 
even in absolutely stable devices, a transient disturbance at 
the input to the first repeater in such a chain will be pro- 
pagated down the chain. It has been experimentally observed 
that under certain conditions the' maximum amplitude of a tran- 
sient disturbance may increase as the disturbance is propagated 
from one repeater to the next and in some cases there may be 
many oscillations of sufficiently large amplitude to render the 
system inoperative because of prolonged over-loading. 

If the entire chain from its input to its output end 
is considered as a whole, the chain does behave then in many 
respects like an unstable non-linear device in spite of the 
fact that each repeater in the chain is absolutely stable. 

Since it is obvious that the above type of behavior 
is at best undesirable in a cross-country link, it is necessary 
that its cause be thoroughly understood and that all .possible 
steps be taken either to suppress it or, if this is not possible, 
at least to minimize its effects. 

Although it is not reasonable to expect that transient 
oscillations can be kept from propagating down the line, or that 
it is possible to isolate the line from all transient disturbances 
it is reasonable to seek a means of guaranteeing that the tran- 
sients that are propagated down the line will never possess 
amplitudes that exceed the magnitude of the original disturbance 
or to seek a way to guarantee that the maximum response of the 
transient oscillations will occur so shortly after the initial 
disturbance that physical apparatus will be incapable of follow- 
ing or distinguishing it from the unavoidable initial disturbance. 
A way of guaranteeing the first of these will be discussed at 
length and a suggestion will be made which it is felt will 
guarantee the second, although no rigorous proof of this last 
fact has yet been given. 

Fig. 2 represents a schematic drawing of a typical 
satisfactory type of transient response which might result from 
a unit step input to the first unit of Fig. 1. Fig. 3, on the 
other hand, represents a schematic drawing of a typical unsatis- 
factory type of transient response which could result from the 
same input to a system of the type of Fig. 1 which had different 
characteristics. Briefly then, the problem to be discussed is 
that of determining the relationships between the network 
characteristics and the transient response for networks of the 
form of Fig. 1. 

Mathematical Formulation of the Problem 

A sudden change in level in the pilot freauency 
before the n-th repeater results in the modulation of this 
frequency, changing it from its normal form 

A sin <i> t 



A sin u> t [1 + f(t) ] 

where f(t) represents the modulation introduced by the tran- 

After passage through the n-th repeater, this last 
expression is transformed into 

A sin (u> t + <p) [1 + g(t)], 

- 4 - 

where the repeater and regulator have (possibly) changed the 
carrier by the addition of the phase angle q> and have modified 
the original envelope A[l + f(t)] into A[l + g(t)]. 

It is clear that from the standpoint of regulation 
it is sufficient to limit discussion to the transformation 
of f (t) into g(t) .* 

The exact relationship between f(t) and git), of course, 
depends upon the characteristics of the repeater-regulator cir- 
cuits which are in general non-linear. However, for small signal 
inputs their behavior may be satisfactorily represented by that 
obtained from a linear invariable four- terminal network. Thus, 
the chain of self-regulating repeaters may be replaced, for the 
purpose of mathematical analysis, by a chain of linear invariable 
four-terminal networks having a common transfer ratio y(p). Thus, 
the blocks of Fig. 1, will be idealized as being such linear four 
terminal networks throughout the analysis. 

Because regulation is designed to compensate for low 
frequency phenomena, certain characteristics that y(p) should 
possess are known a priori : namely; 

" (1) y(p) must represent a high-pass system. That is, . 
y(p) — > 1 as p — > oo 

(2) y(0) should be zero if, in the terminology of servo 
theory, there is to be no static error. 


In terms of y(p), the design of a self-regulating 
system reduces to two problems: 

(I) Given y(p), to calculate the transient behavior of 
the chain of self-regulating repeaters, 

(II) The design of a system having a y(p) which leads 
to satisfactory transient behavior. 

The rest of the memorandum will be concerned largely 
with the first of these. The calculations will be carried out 
in general terms and the different types of possible responses 
will be described in terms of the characteristics of y(p), 

* Transit time between repeaters is neglected throughout this 
memorandum. More exactly, we choose a different origin of time 
at each repeater, so that the transit time does not appear ex- 
plicitly in the formulae. 

- 5 - 

Mathematically the problem discussed in this memoran- 
dum can be formulated as follows: If 'y(p) represents the common 
steady-state transfer ratio of the four-terminal linear units 
shown connected in tandem in Figure 1, the output voltage response 
of the n-th unit V(t) is given by the inverse Laplace integral: 

v n (t) = ^ 

-C + 1CD 


y(p) n e p H (p) dp 

where V (p) represents the spectrum of the input voltage, 

For an impulsive input of intensity Y Q applied at 
time t = 0, 

= V 

For a step function input of height V Q applied at 
time t = 0, 

V Q (p) = V Q /p. 


Specifically, this memorandum will be devoted to the 
study of the behavior of V n (t) for large values of n. 

Four-terminal networKS are normally classed as low-, 
band-, or high-pass depending upon the character ofly(iw)|. 
Typical examples of I y( ico) I are shown in Figure 4a, in which, 
following the usual practice, ly(iu)l has been normalized to be 
unity at a) = in the low-pass case; at o> = w o> (the mid-band 

frequency), in the band-pass case; and at to = oo in the high-pass 

From the viewpoint of the asymptotic behavior of the 
system in Figure 1, it is convenient to modify this classifica- 
tion somewhat when speaking of the over-all gain characteristic, 
|y(iu))| n , of the transfer ratio of a system comprised of n units. 
For sufficiently large n, it is clear that |y(iu)| n would lead 
to curves of the type shown in Figure 4b corresponding to the 
low-pass, band-pass and high-pass curves of Figure 4a . Thus, 
for sufficiently large n, the gain curves B*, C«, and D* of 

- 6 - 

Figure 4b are seen to exhibit the type of behavior normally 
associated with a band-pass characteristic. A'* and E*y °n the 
other hand, exhibit behavior of the type normally classified as 
low-pass and high-pass. For these reasons, the terms low-, and 
high-pass will henceforth be reserved for those gain character- , 
istics which are always less than their values at u = and 
a) = oo , respectively. The termj band-pass, will be used to 
cover all other cases; namely, those in which ly(ia>)| possesses 
one or more maxima at finite frequencies, the values of which 
exceed the values of ly(iu))| at both zero and infinity. 

History of 'the Problem 

Several people have considered this problem in the 
above mathematical form. Before proceeding to a discussion of 
the results of the general theory, it will be instructive to 
consider a few illustrative examples of their results. 


(2) = 

y(p) = p/(p+D 

The gain characteristic is clearly of the high-pass 
type and satisfies (1) and (2) of Page 6. If the input voltage 
is a unit step, then, by the theorem of residues, 




i ' — 'p=-i 

where L- ,(t) denotes the Laguerre polynomial of degree (n-2). 
A plot of V n (t) for n = 1, 2, . . . , 10 is shown in Figure 5. It 
is known that for large n 

Lit) = J= ? (nt)- 1/4 cos 

11 V TT 

2(nt) 1 / 2 - g 

*This examde was first treated by L. A. HacColl (MM-39-325<-166) , 
9/11/39 and W. H. Wise ( UK- 38-343-22 ) , 8/2/38. The above 
treatment follows that of LlacColl. 

where = is to be interpreted as "asymptotically equal to." 


A plot of the approximate "envelope" 


1 e 2 (nt)' 1 / 4 

is given for n = 50, 100, 150, 200, and 250 in Figure 6. 

The response in this case is seen to be both ampli- 
tude and frequency modulated, the "instantaneous frequency" in 
the sense of frequency modulation theory being given by 

u ' m ^ (2(nt) 1/2 ) « A 

while the envelope of the amplitude modulation is approximately 
exponential. In particular, the type of behavior found here 
can be considered satisfactory since there is no tendency for 
the magnitude of the largest overshoot to increase without limit 
as the number of repeaters is increased. As will be shown 
later, this type of behavior is typical of any network having 
a high-pass characteristic in the generalized sense of that term 
as it has been defined above. 

In MM-40-3500-92 dated 10/14/1940, J. G. Kreer and 
J. H. Bollman concluded that the appropriate y(p) for a self- 
regulating repeater employing a directly heated thermistor 
element in the control device was given by 

It should be observed that for o 4= this transfer 
ratio does possess static error. L. A. MacColl in MM-40-130-270 
treated this case for Id < 1 and found that the system exhibited 
essentially the same type of satisfactory behavior as that 
discussed above. 

- 8 - 

(2) A slightly more complicated example is given by 

y(p) = P<P + °] 

(p + D 2 * ' 

It is easily seen that for a < vTT, I y( iu>) I is a high-pass 
jharacteristic in that I y( ico) | < 1 for all finite to and 
y( io>) I — > 1 as co — > oo . On the other hand, if ft > -/IT, 
y(io))| possesses a maximum greater than 1 at some finite 
frequency. ly(ito)[ is illustrated by curve I in Figure 7 for 
a = 1.4 (high-pass) and by Figure 8 for c = 2 (band-pass). 
The response V n (t) to a unit step function is shown in Figures 

9 and 10 for these two cases with n = 1,2 9. The character 

of the response is seen to be of a radically different kind 
for these two values of a. 

For a = 1.4 the response is seen to be of the same 
type as that encountered in the first example. For a = 2, on 
the other hand, it seems to represent an oscillation in which 
the magnitude of the largest overshoot is increasing without 
limit as n tends to infinity. Later it will be shown that 
this is in fact the case and that satisfactory operation is 
impossible for a large number of repeaters in this case. 

From this and other considerations L. A. MacColl 
conjectured that a necessary and sufficient condition that 
the response V (t) be bounded for all n was that the transfer 

ration y(p) have no net gain at any frequency. Mathematically 
expressed, a necessary and sufficient condition that 

I V n (t) I < M for all n, 
where M is independent of n and t, is that 

(M) I y( ito) I < 1 for all real frequencies to. 

Physically, the condition on y(ito) prevents the transfer ratio 
]y(ito)| n for a system using n units from having a tremendous 
gain at any particular frequency. 

This case was also treated by L. A. MacColl, but no memorandum 
on it was ever written. 

In one sense this memorandum could be summarized as 
a proof of this conjecture. In particular, a direct proof of 
the necessity of MacColl's condition (M) is given in the second 
part. The remainder of that part is devoted to an indirect 
proof of the sufficiency. The argument consists in exhibiting 
the two types of possible responses; the first being that 
associated with a y(p) satisfying MacColl's condition and that 
second that resulting from a y(p) which violates it at one or 
more frequencies. 

Statement of Results 

The detailed results of the sufficiency argument 
are discussed conveniently in terms of the generalized 
characterization of high-, band-, and low pass y(p)'s as 
given on page 8, The results will be taken up in that order. 

High Pass 

In terms of the above classification, the class of 
high pass y(p) 's consists of just those functions which satisfy 
MacColl's condition and are therefore those from which a satis- 
factory response could be expected. For the y(p) f s in this 
class, it is clear on physical grounds that the maximum contri- 
bution to the response V (t) of equation (1) will come from the 

large values of |w| since for these values of I u| , |y( io))| n > 1 

while for all other values of I co| , I y( iu>) I n — > 0. Using the 
first three terms of the Laurent expansion of y| iu>| about u = oo , 
one finds: 

(5)* y(iu)) = 1 + S_i + \ , 

(6) ly(iu)l ~ 

, a 2 + 2b 

1 + — s — 




(7) Angle y (iuj Sf.g . 

* It is assumed that a > 0, b < 0, and that 2b + a <,0. These 
assumptions correspond to a second order maxima at I u)l == oo and 
to a monotonic decreasing phase function for y(p) as I oo] — > oo . 

- 10 - 

If these approximations, which are valid for I to| sufficiently 
large, are introduced into equation (1), it can be shown that 
the principal contribution to V (t).for a unit step input is 
given by: 

V n (t) * (n)- 1 ^ (nat) -lA exp | jfi!j±-^>tj cos (EvHSt 

This, with a suitable interpretation of the constants 
a and b is seen to be of the same general form as the response 
obtained by liacColl for y(p) = p/(p + 1) as given by equation ( 
Just as in that example the response is both frequency and ampli 
tude modulated. The instantaneous frequency of oscillation is 
again given by 


The gain for 

y(p) = P(P i 

(P I D 2 

is shown on curve I of Figure 11. Curve II of this figure 
represents ly(iw)| 100 for this y (p'). For this example and 
n = 100, the true gain |y(iu)|100 an a the gain approximation 
resulting from equation (6) are indistinguishable on the scale 
of Figure 11. 

The corresponding phase characteristic for y(p) 100 
is plotted on Figure 12 where, for reasons which will appear 
in Part II, the actual frequency has been replaced by 

w» = ^_ . 


Again, on the scale of Figure 12 the actual phase is indis- 
tinguishable from the approximation resulting from equation (7). 
Figs. 7 and 13 present the same information for 

y(p) =2l£_^il 

(p + ir 

and n = 100. 

- 11 - 

Again the agreement between the actual phase and the approxi- 
mation is excellent. However, there is a considerable error 
in the gain approximation for small I <d| ► This large error is 
unquestionably due to the fact that the value o = 1.4 is near 
the critical value a = ST at which the characteristic changes 
from high-pass to band-pass. 

Agreement with the above asymptotic formula can of 
course be obtained by increasing n sufficiently. Alternately, 
for n = 100, a better approximation to the gain can be obtained 
by writing 

y( iu) = 1 + 

a i 



~2 + 



ly(iu)l = 

l + 

2b + a 

2d + b + 2ac 


' I/ 2 

This approximation leads to a curve which is indistinguishable 
from that of FyU^)! 100 in Figure 7. With this approximation, 
one finds the following expression for V Q (t) when the input 
is a unit step function 


V (t) * (nj^Cnat)- 1 / 4 cos (2^nat JL ) exp( (a ^ 2b U ) 

( (2d + b 2 + 2ac)t 2 ) 

i 1 + 2^ ■! 

( ) 

This expression is seen to approach that given by equation (8) 

as n > co . Thus one can conclude that the response will 

always be satisfactory if' y(p) belongs to the class of high-pass 
characteristics . 

Band-Pass Case 

MacColl»s condition is clearly violated whenever ly(iu))| 
has one or more relative maxima greater than 1 at finite fre- 
quencies. For simplicity the case where |y(iw)l has only one suet 

12 - 

maxima at u = to will be treated first. It will furthermore be 
assumed that this maximum is of the second order; i.e. 

d 2 
dw 2 

^ 0. 

Under these conditions, it is physically clear that the maximum 
contribution to the response V (t) as given by equation (1) will 
be due to those frequencies near o>o, at which I y( iu>) I possesses 
its maximum, since as n increases ihis region becomes increasing 
more important than all the rest. It is also clear that the time 
of maximum response will be given by the delay time experienced 
by the frequency w Q in passing thru the network. This is known 

to be given by. t Q = - n B'( w ) where B f (u ) denotes the slope of 

the phase characteristic B(u>) in the expression 


y( iw) = A(uj) exp ( iB(u) ) . 

If A(to) and B(u>) are expanded in a Taylor's series about u> = co q 

and terms up to the second order retained, it can be shown that 
the response to a unit impulse function is given by 

(ii) v n (t) = A( ^J n 


G(u ) exp ( 

-(t-t o ) c H(0) n ) 

o/ ) cos |u> Q t + nB(u Q ) 


0(» ) - n-V8j 

( — 

A"(" ) 


* CB»»(w )n 

H(« ) 

A' '(cu p) 

(I A"l« Q ) 



- 13 - 

(B"(w ) A{« J) 
i o ((, o ) = arctanj 2a ,, ([Uq) ) 


t Q = -nB(w Q ) . 

Thus V (t) can be interpreted as an amplitude modulated 

wave with an envelope proportional to the Gauss error curve 

(-(t-tj 2 ) 
e *Pj 2n H ^o)j 

with a standard deviation given by 


( n 




- )l/2 

(B"(U) Q ))2 

J ) 

The standard deviation cr is of course a convenient measure of the 
duration of the disturbance. The maximum response occurs for time 
t = - n B' (« ) at which time the amplitude is proportional to 

A(" ) n 

. ✓IE 

Thus if A(w ) >1, the maximum response will represent a value 
which is very large compared with unity, the magnitude of the 
original disturbance, if n is large. This would force any system 
involving vacuum tubes to overload if n were sufficiently large. 

These properties are summarized in Figures (14) and 
(15). Figure (14) is a plot of the response for values of t 
near t for a few values of n for the example given by equation 

(4) where a = 2. Figure (15) is a plot of the maximum response 
for a few values of n for different values of the parameter a. 

It should be remarked that the above approximation to 
the gain which was obtained by keeping only the first two terms 

- 14 - 

of the expansion of A(w) about go = u) Q could only be expected to 

be a reasonable one for fairly large values of n, since it 
represents a usually unsymmetric gain characteristic by a 
symmetric function. A better or second approximation can be 
obtained by using three terms of the Taylor's expansion instead 
of two. Just as in the high pass case, the retention of this 
extra term gives rise to a second term in the expression for 
V n (t) but it does not fundamentally alter the characteristics 

of the response since the correction term vanishes for t = t , 

at which time the response is still a maximum, with the same 
amplitude as before. Its only effect is to take cognizance of 
the unsymmetrical character of the gain characteristic A(w) and 
to change the resulting response envelope to an unsymmetrical 
one. Of course, it also modifies the phase of the oscillation 
inside the envelope in a complicated way without changing the 
fundamental frequency of oscillation. • 


For these reasons and because of the complexity of the 
resulting expression, it will not be written down here explicitly 
although the explicit approximation to the gain A(w) will be 
discussed in Part II. 

The two approximations to the gain are illustrated for 
equation (4) with a = 2 in Figure 16 for n = 100, In this case 

. . |u)|-/) 2 + 4 
A(u) = 5 • 

(iT + 1 

As can be seen from the figure, the second approximation does in 
fact represent A(w) over the significant range of frequencies 
near -w from which it can be concluded that the response will be 

unsatisfactory. Figure (14) r previously referred to, furnishes 
a picture of the envelope response as obtained from the first 

In the event that A(^) takes on its maximum value at 
more than one place in the finite frequency range, it is clear 
that the above results can be generalized as follows: 

Let V . (t) be the response of the form given by equation 
(11) due to a maximum at co = w- , Let the time of maximuma response 

- 15 

from this maximum be denoted by t. = -nB*( w j_)» Then the total 
response is clearly given by the expression 


v n (t) = Z V .(t)., 
n i=1 ni 

if there are k relative maxima* Unless the values of A(w) at 
the points u) = are nearly the same, it is also clear that 

only those terms of the above sum which correspond to the largest 
maxima of A(w) will be of significance. . 

The band-pass case is also discussed briefly for unit 
step inputs in Part II. 

Low Pass Case 

Since the low-pass case differs from the band pass case 
only in that A(w) has its maximum for w = instead of at u = u Q 

^ the results of the two are very similar. The results in 
the low-pass case are simpler because it will be recalled that 
B(w) (as defined by equation 10) is an odd function of 10 for any 
physical network, This forces both B(0) and B'^(0) to be zero so 
that for an impulsive input one obtains the simple formula; 

(12) j It) Vim In" 3 / 2 
n -/2n ( 


-1/2) (t-t Q ) 2 A(0)) 

J exp [ 2n A'* (0)j 

This result corresponds to the well-known formula from 
transmission line theory for non-distortionless lines. 


From the practical viewpoint the above results have the 
following implications for communications systems such as a 
cross-country coaxial telephone system employing self-regulation 
repeaters spaced at intervals of a few miles. 

(1) If the transfer characteristic of each individual 
network is of the high-pass type (in the sense in which this term 
has been used above) then the transient response will never exceed 
the initial value of the disturbing input voltage and it will 
be damped out so that the operation of the communication system 
would generally be considered satisfactory. 

- 16 

(2) If the network is not of the high-pass type, the 
usual practical case, and there is any net gain in the system, 
which is peaked at u> then for even a small number of units the 

response will exceed the initial input at the time given by 

t Q = - nB'(u> ) 


A'(u) ) = 

and if the number of units is sufficiently large the output 
from the n-th unit will be large enough to cause severe over- 

At first glance these implications are not promising 
and seem to indicate that the operation of a cross-country 
system involving several hundred repeaters and regulators would 
be extremely difficult, since , the only satisfactory characteristic 
is difficult to attain in practice. However, "practically the 
ideal characteristic which is high pass can be approached in the 
sense that the peaked frequency can be made very large. Thus 
the maximum response may occur so soon after the initial distur- 
bance that the physical system would not be able to follow it or 
to distinguish it from the initial disturbance which in many 
cases would be large enough to cause momentary overloading of the 

Moreover, it is ah experimental fact that in the design 
of feedback regulator characteristic forcing the peaked frequency 
higher reduces the size of the- peak which in turn will permit the 
use of a larger number of regulators in the system. 

If this is done, the time of maximum response, t Q = 

nB'(^ ), will be small since B'(a)) in general is small for large 

u). Assuming that the effects of the maximum response have been 
treated in this way, it is natural to inquire into the type of 
response which will result for finite values of t > t Q . 

If one examines the gain characteristic curve of the 
type shown in Figure (7), it is clear that for frequencies less 
than some frequency u>, slightly less than the peak frequency u> , 

- 17 - 

the shape is fundamentally like that of the high-pass case. 
Remembering that the phase delay of a frequency through a linear 
network is given by the slope of phase characteristic at that 
frequency, it is clear that the response for values of t greater 
than t Q , the time of maximum response, will come from the fre- 
quencies less than u Q , since the phase slope characteristic is 

large for small frequencies and small for large frequencies. 
Now if it is assumed that the phase characteristic nB(u>) is a 
monotonic decreasing function of to, it is clear that the 'function 
(nB(w) + tot) will always be stationary at an arbitrary frequency 
u>, provided that t is given a suitable corresponding value. Thus, 
it is reasonable to expect that the response for t » t Q * will 

exhibit the same type of character as that obtained in the high- 
pass case discussed above. This, it will be recalled, is both 
frequency and amplitude modulated with an envelope which decreases 
approximately exponentially. Thus, under these circumstances it 
seems reasonable to supoose thet satisfactory operation of the 
communication link could be obtained. 

To recapitulate, the most practical design for any 
system of the type envisaged in Figure 1, from the viewpoint of 
satisfactory transient response involves approaching the high- 
pass characteristic as closely as possible by making the gain 
characteristic of the transfer ratio peak at as high a frequency 
as is practicable and by keeping the phase slope characteristic 
monotonic for all smaller frequencies. 


Mathematical Discussion 

Theorem I. A necessary condition that the response V n (t) from a 

chain of n-four terminal linear invariable networks sub.ject to~a" 
unit step input function have a common finite bound for all n is 
that the transfer ratio y(p) satisfy the relation - 

(M) |y(iu))|< 1 for all real values of w. 

* A different type of expansion, valid for any fixed t or n — > co 
is discussed at the end of Part II. 


- 18 - 

Proof: By hypothesis 

Iv (t)|< M for all n where M is independent of n and t 
n ■ 


so that 

V n (p) = J e" pt V n (t) dt 

n VP) 
y(p) n . , pVn(p) 

ly(p)l n - ipl|f° e~ pt v n (t) dt| 

lv n (t)l dt 

< I pi M J I 

If p = c + iw and if c > 0, then 

' 2 'c 

C + Od 


so that 

log (y^kllog ^V/ 

Thus, in the limit as n — od , it follows that for any 
p with a positive real part 

log I y(p) !< 

- 19 - 

and hence 

ty(p}]< i 

Since this relation holds everywhere in the right-hand half 
plane, it follows from simple continuity considerations that 
the maximum of ly(iw)|, never exceeds 1, Thus 

ly(iw)l < l 

as was to be shown. 

The remaining discussion will be devoted to the 
characterization of the different types of possible responses 
and will, at the same time, furnish an indirect proof of the 
fact that the condition (M) on y(p) is also sufficient. 

High Pass Case - Unit Step Input 

If the networks comprising the system shown in 
Figure 1 possess a transfer ratio having a high pass ^ gain char- 
acteristic in the sense defined above, and if one writes , 

y(iu>) = A(u) e iB(u)) 

then the gain function A(«) satisfies the two conditions 

(A) A(w) < 1 for all finite frequencies u». 

(B) Lim A(w) = 1 

to •-* 00 

Under these conditions it is clear that, for sufficiently large n, 
the main contributions to V n (t) will be due to the high values of 

I u)| . For convenience, . V n (t) is written here in slightly dif- 
ferent form 

V n (t, -He \l f A( .,» e W«'-' -^ 
("J ) 

- 20 - 

For large values of I w| , all physical transfer ratios y(ito) 
of interest to us here can be represented by an expansion 
of the form* 

M „v , . , . ( , ai b ci d ) 


We. shall confine our attention to the ordinary case, in which 
a > 0, b < and 2b + a 2 < 0. For large values of f col , we now 


(14) A(u) = S[l + \ + 4 + ...T 2 + C§ + -% + ---l 2 ! 

V GO U) to ' 

a c 

(15) B(u)) = arctan u) 

— + —75- + • • • 

, b d 
1 + ~2 + ~4 + 

It is clear that, for I oo| sufficiently large, the 
leading terms of these expressions will furnish adequate approxi- 
mations to A(u) and B(w). These are: 

2 9 . 1/2 
(16) A(w) = [1 + a + z 2b ] 

(IV) B(u)) = § . 

Let u Q be the frequency defined by the condition that 

these approximation are accurate to within the arbitrarily chosen 
permissible error e for values of go such that w>w q . Then we 
can write 

* In the usual case y(p) is a rational function, so that this 
expansion can be readily obtained. 

- 21 - 

( „co . r _ , , . n n doj 

V n (t) = ± Re J o ° A(co) n e irnB(u)) + ut^] - 

O) CO 


=-±Re (I x + I 2 ). 

It is clear that 

II I < fo iam£ dw- 

1 ~ J I col ■ 

Since fA(w) J n — for each co in the finite range < to < u , 

it is clear that 1 I -J can be made negligibly small by taking 

n sufficiently large. Introducing the new variable v defined 
by the relation 

v = CO 




I 2 can be written as 

r 00 

1 + 

(a + 2b )t 




(a + 2b)t 
av 2 

- 22 - 

and using the binominal expansion, one has 

Ca* + 2b) t 



n/2 — 

1 + 


1 + f + 

| (§ - 1) 


1 + J + 1/2 (1 - ^) (X) + 
e^ 2 + terms in l/n. 

Thus, for sufficiently large n, I 2 becomes, approximately 



(a + 2b)t 



Vnat (- + v) dv 

In this form the principle of stationary phase can be applied to 
I 2 (Cf. Appendix I); for the amplitude factor 

(a 2 + 2b)t 
2av 2 



is independent of n,. while the phase function (in the notation 
of the appendix) 

¥(v) « + v) 

is monotonic in the range of integration on each side of the 
stationary point (v = 1) where 

tp'(v) = 

- 23 - 

Physically speaking the form of equation (18) suggest 
the interpretation of V n (t) as the sum of an infinite number of 

complex waves whose amplitudes are slowly varying function of v 
and whose complex phases are rapidly varying functions of v. 
Under this interpretation it is physically reasonable to expeot 
that wave interference will occur everywhere except near v = 1 
where the phase function given by equation (19) is stationary. 
This is the principal of stationary phase. It remains to 
evaluate the principal contribution to I g for values of v near 1. 

Replacing y (v) by the first three terms of its Taylor*s series 
about v = 1, 

q>(v) = cp(l) + + - 1 ) = 2 ♦ (v -l) 2 

the main contribution to I g is given by 


1 * e ir2vnat - |] 


e 2av 2 iVnat (v - l) 2 dv, 


In the interval (1 - r\ f 1 + r\) t the amplitude factor 

i exp T(a 2 + 2b)t/2av 2 ] 

is substantially constant and may be removed from under the 
integral sign and evaluated at v = 1. By the reasoning of 
Appendix I, the contributions to the remaining integral are 
not appreciably affected if the limits are changed to (-co, oo ) 
respectively. Letting 

I * v - 1 
we can then write 1 in the form 

I ~ exp j (a 2 exp fi 2v€St - 1 §3 f°° e iVMt « d£ 

( ) -CD 

- 24 - 

By the known properties of Fresnel integrals 


and hence 

Taking the real part and dividing by n, the asymptotic expression 
for V n (t) is therefore given by: 

(20) V n (t) = n'V 2 (nat)- 1 ^ exp ( ( a g +2b)t ) cos {Z/m _ n, 

which is equation (8) of Part I. 

A more accurate approximation to the gain A(w) n is 

given by 

if,.i n 2b ♦ a 2 2d + bf_j_2ac-.l/2 
A(w) = [1 + * + t J 

where the first three terms of equation (13) have been retained. 
From this it follows that: 

m.a* ~ n ( / 2b + a 2 2d + b 2 + 2ac ? 
A(w) = exp -J- ( § + t J 

exp [n (2b . a 2 ) ] exp j| (2d+b 2 +2ac) | 
(* ^ ) ( 2 ^ ) 

from which it follows that the second approximation is obtained by 
multiplying the first by the factor 

exp (p 


jn (2d + b 2 + 2ac) 

If the frequency transformation v = 


is now made 

the first factor will as before be independent of n. Over the 
range of integration where the integral is significant their 
product can be removed from under the integral sign giving 

V (t) = (n)" 1/2 (nat)*" 1/4 cos (2Vnat - 


(a 2 * 2b)t 
2a _ 


(2d + b 2 + 2ac)t 2 

2a 2 n 

% (u)" 1/Z (nat)" 1/4 cos (2vnat - $) 


(a + 2b )t 

, (2d + b 2 + 2ac)t 2 

1 + J 5 1 * ••• 

2,eT n _J 

which is the equation (9) of Part I. 
Band Pass Case - Impulsive- Input 

For simplicity let it be assumed that the gain charac- 
teristic A(u) has only one absolute maximum at u> = w Q on the 
positive frequency range and that this is a second order maximum. 

- 26 - 

The response V n (t) can always be written in the form 

(co ) 

A ( w o ,n f n log H^-r inB(u) + iut ) 
V n (t) = — Re J o e n l0 * TU^f ♦ dw ). 

In this form, V n (t) can again be interpreted as being proportional 
to the sum of an infinite number of complex waves of amplitude 

with varying complex phase* given by 

cp(w,t,n) «= nB(o)) + wt. 

With this interpretation it is clear that the maximum contri- 
bution to V n (t)^will be given by those frequencies - in the 

neighborhood of u> , where u Q satisfies A r (w) = and at values 
of the time t near t at which the phase function, <p(u>,t,n) 
is stationary for the maximum frequency i» Q . Thus t Q is given 

t = .nBM« ). 


A(w ) ^ and A«(w p j = 

♦"Phase" as used here differs from the way it is normally used 
in engineering. 

27 - 

one can write for a suitable small neighbothood of w Q 

If we retain only the first term of this expansion, then for a 
suitably restricted neighborhood of w Qt one has 



n log A(uQ 

A(u> ) 

nA"(u>o) (u _ u ,.: 

Similarly, for u sufficiently near o) Q 

B w (co ) 2 
(23) B(o>) = B(co Q ) + B»(w )(" ~ « )'* — g < w " V * 

Henceforth for simplicity, we shall write 

A = A(co ), A" = A"(w o ), B = B(w ), B» = B»(« ), 
B" = B w (cj q ) 

If these approximations are valid in the neighborhood, 

(u Q - A, w Q + A it follows that 

v n (t) 


i R e ( 


A(u>) n e^ nB(w) + Wt: d(, 

W o +A _J 

♦ A 

u) Q+ A 

u> o -A 


nA n 

( W - a) ) 2 + i[nB + nB» (w - (D Q ) 

- 28 

Since [A(u>)] n — as n — oo , except near u = w q , it follows as 

before that the sum of the bracketed integrals can be made 
negligibly small in comparison with the remaining one if n is 
taken sufficiently large. Recalling that 

t = -nB'CO 
o o 

the remaining integral can be written as 

T n (t) = | Re U n e 1 ^ ♦ -tl 

,u) o +A r „ 

exp M 11 (w "^o 1 + i(t - t o )(a) -° ) o ) 




Again the finite limits of integration can be replaced by - go 
and oo since » for large n, 

I*- (--.-„)' 


will be small except in the immediate neighborhood of u . 
If one sets 

p . -n (£ * oB") . 

p 2 = i 2 (w - w o ) ; g - t t Q 

then the remaining integral can be recognized as pair No. 710.0 
of the Campbell and Foster Tables. 

Then one finds 

V n (t) = —372" Re {{ A n expCinB+io) t 3 exp [-(t-t Q ) 2 ] 

2n°/ & ( 

( VP 


The result is equivalent to that given by equation (11) 
of part I. If A(cj Q ) is greater than 1, it is thus seen that the 

response will have a maximum value that builds up very rapidly 
as n increases and would eventually force any system involving 
vacuum tubes to overload. 

It should be remarked that the above approximation 
to the gain could only be expected to be a reasonable one for 
fairly large values of n, since it represents a usually un- 
symmetric gain characteristic by a symmetric function. A better 
or second approximation can be obtained by keeping the second 
term of the expansion of the logarithm in (21), and then tak- 
ing the first term of the expansion of 

(U) - 0) )' . 

This yields 

The addition of the second term in the above ex- 
pression gives rise to an additional term in V n (t), provided 
that the same phase approximation (23) is retained. The 
resulting V (t) is similar to (11) but the new envelope con- 
sists of the old envelope plus nA"/6A times the third deriva- 
tive of the old envelope. The modulated frequency remains 
the same but the phase is changed in a complicated manner. 
(Compare- pair 710.3 of the Campbell and Foster tables). 

Unit Step Input 

In this case one can write 

V n (t) = - Re 


i[nB/u) + g] 


As before the only significant frequencies are in the neighbor- 
hood of a) = to and near this point the 1_ in the denominator 

can be taken out of the integral as l/w" provided u> Q i 0. Thus 

the result will be same as for the impulsive input apart from 
the factor l/w Q if one makes nB(u>) - n/2 correspond to nB(u>) 

in (11). 
Low-Pass Case 

It is clear that the analysis for this case in which 
the equation A'(") = is satisfied for w = can be carried 
through in exactly the same manner as the band-pass case treated 
previously. The resulting answer is capable of simplification, 
however, if it is recalled that B(w) for any physical network 
is an odd function of This forces both B(0) and B ,f (0) to 
be zero. The resulting formulae then become 

a) Impulsive Input 

b) Unit Step Input 

A(0) n e W A(0) 
2n A"(Cfr 



v n (t) 



3/2 /2nA' Ha) 
n J A(Gj 


(-(t-t Q ) 2 A(o)) 
exp j 2nA"(») j dt ' 

31 - 

This last expression involves an integral since it 
is necessary to eliminate the pole at zero where A(w) has its 
maximum. This can be done by differentiating V n (t) with res- 
pect to t, finding the aysmptotic formula for V^(t) as before 

and then integrating to obtain (24) • 
Hamy*s Expansions in the Band-Pass Case 

The type of asymptotic expansions so far given for 
the band-pass case were explicitly designed to represent V n (t) 

in the neighborhood of t = t where V n (t) is a maximum. They 

could in no sense be considered the true asymptotic expansions 

for values t« t or-t» t . In particular their derivation 

o o 

depended upon the fact that the 'time of maximum response was 
related to the number of four terminal networks by means of 
the equation 

t =-nB'(w o ), 

so that as n — oo , t Q — oo . 

Other types of expansion are clearly possible. 
Two obvious alternatives are: 

(1) Those valid for fixed n as t — oo ; 

(2) Those valid for fixed t as n co . 

The first of these will not be considered here since 
they are of little interest as all of the four terminal networks - 
have been assumed to be absolutely stable. The interested reader 
is referred to the book by Doetsch on Laplace Transformations 
for expansions of this type. 

Since the second type of expansion is of interest 
here and is not to be found in most of the standard reference 
works it will be discussed here briefly. 

In a classic paper, M. Hamy* derived general ex- 
pansions of this type for complex integrals of the form 

J f(z) <p n (z)dz 

♦journal de Mathematique, vol. 4, 6th series, 1908, page 203. 

under a variety of hypotheses on f(z) and <p( z) . These condi- 
tions include the case where qr(z) has a saddle point given 
by the solution of tp*(z) =0 and the result of this case is a 
generalization of the often-used theorem of Fowler which one 
finds in his book on statistical mechanics under the title of 
the saddle point method. 

More to the point, they also include the case 
where cp(z) has one or more maxima on the path of integration 
at which <p*(z) =0 provided that f(z) admits a Taylor series 
expansion about these points. In particular, then, if one 
considers t as a fixed parameter 'they apply to the integral 
of equation (1), with c = and <p( z) = y(p); f(z) = ePtv Q (p). 

In terms of our notation, one finds that: 

(a) for an impulsive input with gain maxima at <*) = w Q 

2A n (cO x 
V tJ ~ nB'(a>°) COS r V + n B(u, o ):i + term in ^ * 

(b) for a unit step input with gain maxima at w = u Q f 0. 

2A n (w ) , 

V n (t) ?a COS [ V + nB ^o ] ^ + termS in — ' 

■ v o' o n 

It is interesting to note that these formula indicate 
a dependence upon 1/n instead of 1/Vn as in the case of the 
previous expansion. These formulae can be thought of as repre- 
senting the response in the band-pass case for any fixed t, 
t« t Q . 


Appendix I 


Certain remarks of Aueral Winter* on the justification 
of the principle of stationary phase are pertinent enough to 
the above discussion to bear repetition here. In order for the 


f(x) e^ (x, dx 

to be asumptotically represented as p — oo , by the formula 
(Cf. Lamb, Hydrodynamics p 395) 

(26) a ^J^ToT . e irP9(a)±inJ 

. y|pltp"(a)l 

where cp'(a) ■ and where the upper or lower sign is to be 
taken according as <p"(a) is positive or negative, it is 
evident that two things are sufficient. 

(1) The contribution to the integral outside a small interval 
around the stationary value a of <p(a) must decrease more 
rapidly as a function of p than the one obtained in the 
neighborhood of a; 

(2) The asymptotic formula given above must adequately re- 
present the behavior of the contribution to the integral 
from the neighborhood of. the stationary value a. 

Now, if, on any closed interval I, <p*(x) is continuous 
and has no zeros, and if <p(x) is strictly monotone in this inter- 
val, then z = <p(x) can be introduced as a variable of integration 
on that interval, transforming S into 

* Method of Stationary Phase Journal of Math. & Physics, 
vol 24, no 3-4 - 1945 

- 34 - 

f(x) e^ (x) dx 

f [^(zJJ e ipz dz 

If, in addition to the above, <p(x) and tp f, (x) are continuous 
and if f(x) and'f'(x) exist and are continuous, this last 
integral can be integrated by parts, giving 

S = 

| fr^une ip2 j 






e±PZ A fCT _i (z)]dz 


and showing that on any such interval I, 


Thus, condition (1) will be satisfied if, in the 
neighborhood of the stationary 
the integral is greater than 



a, the contribution to 

This is clearly the case when the asymptotic formula 
(26) is valid, since there the dependences on p is as 1/vp. 
it can be shown that (26) is valid whenever 


tp(ct) = 0, <p tf (a) f and <p« • (x) and f|> 

are of bounded variation in the neighborhood of the stationary 
value. Thus, to recapitulate, under these conditions, the 
maximum contribution comes from the stationary point and depends 
on p as l/vpt while the points which are not near the stationary 
point contribute terms depending upon p only as l/p , 

To conclude this brief appendix, it should be remarked 
that Winter gives an extension of (10) which is valid under 
the same condition of f[tp~l(z)] if the first n derivatives of 
<p(x) vanish at some point a while cp n+1 ( x ) does not. These results 
could be used to extend the treatment of the high-pass case 
given above to the cases in whion a 2 + 2b = 0, etc. 



B-392415 to 392428 

FIG. 3 







'— (OOI=U)% 
— (0S=U) , 1. 

125 db- 












)x.-y / 

5 • 

• \ 


\ * \ \ 




\ \ 2 








/ 1 






FIG. 16 

"» A 

Electronic Methods in Telephone Switching 

C. E. Shannon 

In the recent development of electronic digital computing machines various new 
tubes and other electronic devices have been designed which may be of use in 
machine switching. In particular the "selectron" tube developed by R. C. A. and the 
mercury acoustic delay tank provide large cheap memory devices in which information 
can be registered or read off in electronic time intervals (of the order of 
microseconds). Since one of the chief functions of the relays and switches in a 
telephone exchange is that of memory (e.g. the relays remember which calling and 
called lines should be connected together) it is worth while considering the possibility 
of using such tubes to replace ordinary electro-mechanical switching equipment. 

Suppose we have an exchange (or set of exchanges) serving n subscribers and that 
the exchange can handle a peak load of m simultaneous conversations. These may be 
between any m pairs of the subscribers. Thus the exchange must be capable of 
assuming as many different states as there are of selecting m pairs of objects from n . 
This can be done in 


ml 2 m (n - 2m)! 

different ways. For n and m large the logarithm of this is approximately 2m log n . 
If the logarithm is to the base ten then this is the required memory capacity of the 
exchange measured in decimal digits. If the logarithmic base is two the units are 

binary digits. A single two-position relay has a capacity of log 2 units (one binary 
digit or .30103 decimal digits), while 5 relays have S log 2 units. A 10 x 10 crossbar 
switch has a capacity of 10 log 10, while a single commutator on a panel has capacity 
log r , where r is the number of vertical positions of the brushes. Hence the number 
of relays required for a pure relay exchange would be 

2m log n 
log 2 ' 

the number of 10 x 10 crossbars would be 

2m log n 
10 log 10 ' 

etc. To these estimates must be added the losses due to inefficient use of the memory 
and also the memory of equipment used for functions other than merely remembering 
which connections are being held at a given time. 

An ordinary relay is capable of remembering (by a holding circuit) one binary 
digit. A pair of vacuum tubes in a flip-flop circuit has the same memory capacity. 
The cost of these is of comparable magnitude, and thus if one designed an electronic 
telephone exchange by merely changing relays to equivalent vacuum tube circuits the 
chief advantage of the electronic circuit would be one of speed, an improvement of 
order 10 3 . In many cases this could produce a reduction of cost since frequently many 
identical units of a certain type must be supplied because the individual units are slow. 
This is apt to be the case with units which are associated with the beginning or end of 
calls but need not be used during the conversation. On the other hand equipment to 
be used throughout the call would offer less advantage under this tube for relay 
replacement since the expected duration of calls is long compared to electronic times. 

The newer electronic memory devices, however, change this picture considerably. 
A selectron tube (when these tubes are in production) may be expected to cost $100 or 
less depending on the demand. It is capable of holding 4096 binary digits, giving a 
cost per binary digit of the order of 2.5 cents, while the cost of the equivalent relay 
may be of the order of 2.5 dollars. Mercury delay lines can store information at a 
comparable cost. Thus it is not impossible that a reduction of the order 100 to 1 in 
switching equipment cost might be possible by the use of electronic devices, even in 
the parts where information must be stored for long periods of time. 

An indication of how such tubes may be used is given in the attached figure. 
Fig. 1 is a block diagram of a simplified exchange. The calling parties are connected 
to an electronic commutator which samples the speech signals periodically and puts 
the various lines in the time division multiplex. The called parties are also connected 
in time division multiplex to a single channel by means of an electronic commutator 
or distributor. The function of the middle part is to rearrange the samples in such a 
way as to provide any desired interconnection between calling and called parties. This 
is done by dividing the sampling period into two equal parts. During the first half the 
signal plate of the upper selectron is connected by gate 1 into the calling line 
multiplex channel. Its windows are caused to open in sequence. Thus at the end of 
the first half-cycle the first samples of all the incoming channels have been written on 
the face of the tube in their regular order. During the second half-cycle gates 1 and 3 
are closed and gates 2 and 4 are opened. Thus the output of the selectron is fed into 
the called line multiplex and the windows of the selectron are controlled by the other 
selectron tube 2. This tube has registered in a suitable notation the numbers of the 

called line desired by the calling line. The windows of this tube are opened 
sequentially by the cycling unit and the numbers registered there control the windows 
on tube 1 allowing the sample from calling channel 1 to go into the proper place in 
the called line TDM. 

By a more elaborate system it is possible to make use of the fact that only a small 
fraction of the lines will be busy at a given time, as is done in ordinary relay 
switching. This can be achieved by only supplying enough places in the distributors 
for the peak load. When a call originates the calling and called parties are assigned 
idle spaces in the distributor. The place assigned to the called party is registered in 
the selectron register corresponding to the place assigned to the calling party. 

Some Generalizations of the Sampling Theorem 

We have seen that a function of time f(t) containing 
no frequencies over W cycles per second can be described by- 
giving its value at Nyquist intervals (spaced ^ seconds apart). 
It can be reconstructed from these samples using the basic 
functions sin 2nWt/2nWt , together with the same function shifted 
by integer numbers of Nyquist intervals. We now consider some 
generalizations of this result. 

In the first place the particular function 
sin 2nWt/2nWt is by no means necessary for the reconstruction. 
In fact any function cp(t) which contains all frequencies up to 
W is satisfactory. More precisely the spectrum of cp(t) should 
not vanish over any finite set of frequencies (set of positive 
measure) up to W. If <p(t) satisfies this condition the original 
function f (t) can be reconstructed using cp(t) and its shifted 
images <p(t + ~) . That is coefficients a £ can be found such 

°° K 
f (t) = 2 a K q>(t + f») . 
j[ — _ 00 *»• * w 

In general the coefficients are not found as easily as in the 
special case where cp(t) = sin 2nWt/2nWt (when they are merely 
the values of f (t) at the Nyquist points) but they may be 
calculated as follows. Let F(w) be the spectrum of f (t) and 
$((0) be the spectrum of cp(t). Expand the function F((d)/$(co) in 
a Fourier series using -W to 4W as the fundamental interval. 

- 2 - 


. ko) 

F(cj) _ T _ _ 2W 
ft(u) ~ L S K 6 

° r £& 

F(w) = Z a K 0>(oj) e 2W . 

Taking the transform of the equation we obtain the desired 

f(t) = 2 a K cp(t + !y) . 

The coefficients in the expansion can therefore be determined as 
the coefficient of a Fourier series expansion of F(w)/<I>(<d) . In 
general the function cp(t + ^) will not form an orthogonal set 
and therefore the energy in f(t) cannot be found from 2 a K as it 
was in the simple case where «p(t) = sin 2nWt/2nWt. 

A physical method of performing this expansion can 
also be given. Consider a filter which gives the output 
sin 2nWt/2nWt when the input is <p(t) . If the function f(t) is 
passed through this filter the amplitudes of the output at 
Nyquist intervals will be the desired coefficients. This is 
true since this output can be considered as expanded in the 
f mictions sin 2TrWt/2rrWt with the amplitudes as coefficients, 
and the inverse filter would restore the original function and 
change each of these functions with cp(t) at the corresponding 
Nyquist point. 

A function f (t) can also be determined from a knowledge 
of its value and derivative at alternate Nyquist points: 

We have here the same number of measurements per second, 2W, 
but half of these are ordinates of f(t) and half are derivatives. 
The reconstruction of f(t) from these values can be carried out 
simply using two basic functions: 

_ ( + x _ sin 2 nWt 

Tllt) '"wmT 

m x . sin 2 rrWt 
*2 {t) ~ (nWt) * 

Both of these lie entirely within the band W and has the 
property that it and its first derivative vanish at alternate 
Nyquist points (except for t =0 where the function is 1 and 
its first derivative 0) . Likewise cp 2 and cp£ vanish at alternate 
Nyquist points except at t = where cp 2 = and (p 2 = 1. Thus 
we can fit the ordinates of the original function f (t) using ^ 
and its shifted images (shifted by two Nyquist intervals). The 
derivaties of f(t) are fitted using cp 2 and its shifted images. 
Due to the vanishing of these functions none of the fittings 
interfere. The function constructed by this process must lie 
within the band and have the same values and derivatives as the 
original function f (t) at alternate Nyquist points. That there 
is only one such function can be shown by arguments similar to 
those used in the basic sampling theorem, generalized by break- 
ing down the spectrum into an even and an odd part. 

- 4 - 

It is possible to carry this further and determine a 
function from knowledge of its value and first (n - 1) 
derivative at points separated n Nyquist intervals apart. In 
this case the basic functions are 

sin 11 (Sgfc) 

*1 = 


( 2nWt x n 
1 n ' 

_ sin n ( agt ) 

1 n ' 

s . n n ( 2^t } 

K ~ n~" ; 

r n 2nWt 

These functions possess the properties: 

1. They lie within the band W. 

2. They vanish at t = |g K = ± 1, ± 2, ... , 
(that is at n-th Nyquist points) and also their 
1st, 2nd, (n-1) derivatives. 

3. At t = 0, all derivatives of cp_ vanish except the s-th 


derivative which is 1. 

Consequently we can reconstruct f(t) by using <p g to 
adjust the s derivatives (s = 0, 1, n-1) and these adjust- 

ments will not interfere. 

The functions q; and their spectra are shown in Fig. 1 

for the cases n = 1, 2, 3* 



e 1 

March 4, 194S 


The Normal Ergodic Ensembles of Functions 

Among the possible probability distributions in a one- 
dimensional space certain ones are of special importance because 
of their simple mathematical properties and frequent occurrence 
in the physical world. The most important of these is the 
normal or Gaussian distribution with a density function: 

1/J2R a exp £ | x 2 /<^ 

In an n-dimensional space the most important distribution func- 
tion is an n-dimensional generalization of this, the n- 
dimensional normal distribution: 

i 5 r - -i 
^IV<a»r e*P a i;j x i xj 

Here a^ is the associated quadratic form and the 
determinant of this form. This form is positive definite and 
the surfaces of the constant probability are found by setting 
the argument of the exponential function equal to a constant 

2 H . x ± Xj = C 

and are therefore coaxial elipsoids in the space. The direc- 
tions of the axes of this elipsoid are those of the eigen- 
vectors of the form a^ and the lengths are inversely proportional 
to the corresponding eigenvalues. By a rotation of axes the new 
coordinate system can be lined up with these directions and the 
distribution function reduced to 

- 2 - 


{X 1» #oe » V (2n) exp - | Z 5^ y* 

where the \± are the (positive) eigenvalues and the y^^ are the 
new coordinates. The form a^j being positive definite has an 
inverse A^j which is also positive definite with eigenvalues 

The properties of the n-dimension normal distribution 
which give it particular mathematical importance are the 

1. If x ± and y ± are two chance vector variables, which 
are independent and distributed according to n-dimensional 
normal distributions with quadratic forms a^ and b^. (inverses 
A^j and B^) , then the chance vector variable = x± + J i is 
also distributed normally with the form c^y whose inverse is 

C ij = fij + B ij° 

2. If x is a normally distributed vector variable and 

yj = 2 r^j x^ is a vector variable which is a linear operation 
on (possibly of smaller dimension thann) then yj is normally 
distributed with the inverse form 

= Z r, r^ A st • 
ij s,t is jt 

,3. Under certain quite broad conditions the resultant of 
a large number of small chance vector variables, x® (s = 1, 2, N) 
with arbitrary distribution functions, which are independent 
gives a normal distribution for 

3 - 


providing no term of the sum contributes more than a small 
fraction to any B. 

4, If the a priori probabilities for each of two 
independent vectors x i and y ± are both normal, the a posteriori 
probability of x^ when we know the sum x ± + 7^ — ^ is 
normally distributed (about a displaced mean, however). 

5. The mean value of x ± x^ for x ± normal is given by 

x i x j = A ij * 

Among the many possible ergodic ensembles of functions 
f a (t) there is also a certain class of particular mathematical 
and physical importance. This class of ensembles can be con- 
sidered a generalization of the n-dimensional normal distribution 
to infinite dimensional function spaces ergodic under trans- 
lations in time. We shall call these normal ergodic ensembles 
of functions. They are completely specified by giving their 
power spectra P(w) or their autocorrelation functions A(t) 
which are the Fourier transforms of the power spectra. The 
normal ergodic ensembles can be defined in various ways. They 
occur physically when we pass a thermal noise through a filter, 
shaping the power spectrum to P(w) = |l(w)| 2 , T(«) being the 
admittance of the filter. 

In the literature on noise these ensembles are often 
treated in a loose somewhat illogical fashion by using either 
of two "representations." The first representation is 


2 |P(nAf)Af cos (nAft + 6 ) . 

The 6 n are all uniformly and independently distributed over all 
values from to 2n. This representation amounts to making the 
noise the sum of a large number of small sinusoidal waves with 
random phases, and amplitudes adjusted to give the proper power 
density in any small frequency range. The frequency increment 
between adjacent waves Af is supposedly very small and in use 
one evaluates any desired statistic of this set of functions and 
determines the limit approached by this statistic as Af - 0. 
This limit is taken to be the desired statistic of the normal 
ergodic ensemble. The second representation is similar but uses 
normally distributed amplitudes a n whose variance cr is equal 
to P(«) 

2 a B Af cos (nAft + 6J . 

Actually these "representations" will not give the 
correct answer in all cases. For example, if we ask what 
fraction of the functions in the representation ensemble r^ 
are periodic, we find that all are, so the probability is unity, 
and the limit as Af is also therefore unity, while almost 
none of the functions in the ergodic normal ensemble are periodic 
However it can be shown that if we restrict ourselves to what we 

have called physical statistics, the answer will be identical; 
the normal ergodic ensemble is the physical limit of either of 
the above ensembles as Af -* 0, 

A more logical definition of a normal ergodic ensemble 
can be given as follows. We divide the frequency range up into 
unit intervals and construct the sequence of "flat" ensembles 
for these intervals. These will be given by 

2 a„ sin nt • 

These ensembles are passed through shaping filters to give the 
proper power spectrum in the interval in question and the results 

The normal ergodic ensembles have properties analogous 
to the n-dimensional normal distributions which we have given. 
We have 

Theorem: The sum of two functions f Q (t) + gp(t) where f and g 
are from normal ergodic ensembles with spectra 
and P 2 is normal ergodic with spectrum P 1 + P 2 . 

Theorem: The output of any linear invariant transducer driven 
by a normal ergodic ensemble is normal ergodic with 
spectrum |Y(«)| P(w). 

Theorem: Any finite dimensional linear operation on a normal 
ergodic ensemble gives a normally distributed vector. 

March 15, 194$ 



Systems Which Approach the Ideal as g — 00 

We will show that it is possible to construct an 


instantaneous system for sufficiently large - for transmitting 
a sequence of binary digits such that the frequency of errors 
is arbitrarily small and the power required only slightly 
greater in db than the ideal for the corrected rate of trans- 
mission. More precisely we have the 

Theorem: Given any e>0 and 8 > we can transmit binary digits 
on an instantaneous basis with frequency of errors 
< e and corrected rate of transmission 

R > W log -jl + (1 - 5) | J 

The system to be used is of PCM type with an extremely large 
number of amplitude levels. Let there be 2 s levels, and number 
them with a binary notation, but in the Stibitz type code, so 
that only one binary digit changes on going to an adjacent 
level. If we are in error by d levels, at most d binary digits 
of the s will be incorrect. If there are many levels in the a 
distance U/I) of the noise the expected number of errors will 
be approximately 



We take £ large enough so that es > a. 

Thus the frequence of 

errors in our final result will be < e. The levels should not 
be spaced uniformly but according to the density of a normal 
distribution. If this is done the received signal will be 
nearly Gaussian with a — J? + N and the corrected rate of 

H > W log 1 + (1 - 5) | 


March 29, 194$ 


Theorems on Statistical Socuencea 

If It la poaalbla to go froa any state with P > 
to any other alone a path of probability p > 0, tha system la 
argodlo and tha atrong law of large nuabera can be applied. 
Thus the number of tines a given path p^j in the network la 
traversed in a long sequence of length K is about proportional 
to the probability of being at i and then chosaing this path, 
P.p. 4 K. If N is larne enough the probability of percentage 
error i 6 In thia la less than c so that for all but a aet of 
email probability the actual numbers lie within the limits 

Hence the probability that nearly all sequences lie within 
limits ± ft is given by 

and lfijLJfc l B limited by 

• I(P lPiJ ± |)log PiJ 


| ^ - * PiPij log Pijj < * 
Thus we have I 

Theorem For almost all sequences 


Um ' to*-* • H • - i PiPij log Pjj 

where p is the probability of the sequence baring the block 
of length L starting at the first position. 

Thus for all but a set of blocks of probability < « 
and for B large enough 

(H - $)«<- log p < (H ♦ n)H 
*.p(H - q)H. < — p log p < P(H ♦ n)M 
where «e hare aummed orer all but the set of small probability 
i. p(H ♦ a.)I £ (I ♦ sJM * P S W * *>* 

and * p(H - q)* (H - q)I * P U - q> ■ U - •> 
For the sot of oaall probability 

•I p log p 

^ log ^ 

since this is maximised f or ip • t by making all p equal, and 
the number of them 1 -Jj • But this is dominated by 

• l P log p| £ |«W lo« | 

1 •» 

with « as snail as d« sired for sufficiently large K and small c. 
Henee this does not affect the sua ia the limit as I -* oo and 
we have the 


Lia £ I p (B t ) log p(B L ) - H 
I - oo 

where plB^ is ths probability of block B^ of length L, and 
the sua is ovsr all possible blocks. 

We now prove the 

Theorem H • - i. p(B i jSj) log PB^ 8 !* 

« Lie -* q(B t Sj) log q B (3^) 

where p(B lt 8j) is the probability of block B i followed by 8^ and 
PB^Sj) is the conditional probability of 8j after the block B t 
ia known to occur. q(B lt 8j) in the probability when B^ ia 
computed on the basis of any initial state probabilities, not 
necessarily the proper ones and q^Sj) the corresponding condi- 
tional probabilities. 

The first equality is trus since we may summ first on 
all B ± leading to a given state K. *he terms q, B ^CS ^) are then 
all equal to Pjj and the terse qlB^j) sum to P K Pjj gives the 
desired result. 

If the q»s are used, the q^lSj^ are still p^ where 
I It the stat* In which B± ends. 

* qU-.S.) • p kj i. P(B 1 ) 

since any Initial distribution tends toward equilibrium. 

We hare shown that apart from a set of small probability, 
the probabilities of blocks of length L lie within the limits 

-(H - S)M .(H ♦ S)M 

* < S> < 2 

where S can be made small by taking B large enough. Let the 
maximum number of blocks of length M when we delete a set of 
measure • be Q g («). Thent 

I p - (1 - t) 

Q (I) p - Q (M) 2* lH * * )M 
t max c 

log t l«) > (H ♦ 6)M ♦ log(l - t) 


log (li) 
Lim S - %U) £ 8 



1 > I p > G C (K) pj^B 

frota which we obtain 



•U) * H 

Hence we hare 

Theoremi vU) - » 'or t J 1 0, 1 

Tha fact that for large M nearly all blocks hare a 
probability limited by 

ri°JLE ♦ s 

< * 

does not imply that those probabilities approach equality. 
In fact they will generally diverge from one another but the 
db range becomes small compared to K, eince for p's satisfying 


this inequality 

*»« Pmax lQ g Pmln m log _ 
I II 1 

It it possible to show, however, that thert exists among the 
blocks of length It a subset, all of equal probability which 
hare the sane growth with K as the set including all blocks 
except those of small probability totaling less than t: namely , 
the subset will contain more than 2* H " ^ N eleoents with 5 
arbitrarily small. 

Consider all blocks beginning in a given state, say 

state 1, and ending in this state. Let these blocks B 1 

fig*... have lengths n^, n 2 ,...., t^, .... and conditional 

probabilities p^, p 2 , p at ..... when we start from state 1. 

We first prove 


Theorem: I p^n^ • p^ 

The first part is true since the ergodic character of the system 
makes the Inverse frequency of occurrence of state 1, equal 
to the mean distance between its occurrences, I Pi*i« The 
second part is true since almost all blocks of large length N 
have approximated the proper frequency of each B^. 

Now we return to the construction of a subset of growth 

(H . 6)1 

2 all of equal probability* Let us choose integers 

a i at close as possible to 

and construct sequences with of the block B ± . The number 
of block* is then 

and the number of sequences: 

» <- P t log p t 

The growth Is then in term* of symbols 

lag* . , * 4* . 

This proves the following! 

Theorems Given I > there exists a set of M blocks of length X 
(when H is sufficiently large) such that 

AS - ft)S 

k> a 

and each block has the same probability, and starts and ends in 
the eeme state, which can be chosen arbitrarily* 

In case the system is not ergodle but made up of a 
finite number of ergodle systems: 

r - X c t r t 

each r t will hare a rate H i which we may assume arrengee in a 
now increasing sequence 

The function %{•) then bieoMi a decreasing atep function in the 
manner Indicated by the following I 

Theorem! In the case conaidered 


?(c) • in the internal la^ <i< j ^ 

For if c it in the range indicated we oust take a set 
of poaitiTe probabilities froa at least one of r 1# ...» rj. 
This gives a growth of type 

at least, and can be limited to this by choosing all sequences 
The quantity 

will be called the man statistical rata for the system. 


April 26, 194* 

Samples of Statistical English 
C B S^a**o* 

A number of samples of statistical English including 
probability structure out to four, words are given below. These 
were constructed by starting off with three words from a book. 
These three words are shown to someone who fits them in a 
reasonable English sentence and writes down the word following 
the three. The first word is then covered up and the process 
repeated with a different person, etc. If the imagined sentence 
ends after the added word, the person writing the word adds a 
period. For samples bearing a title the participants were told 
that this was the subject dealt with. These samples may be 
compared with those in "A Mathematical Theory of Communication" 
where less statistical structure is included. 

The samples given here were obtained for the most 
part, with the aid of J. R. Pierce, B. McMillan, C. C. Cutler 
and W. E. Mathews, A few of the samples were obtained from 
other sources (contemporary literature, etc.) and are included 
for comparison. The reader may try his skill at guessing which 
are statistically constructed. The true sources are given at 
the end. 

1. This was the first. The second time it happened without 
his approval. Nevertheless it cannot be done. It could 
hardly have been the only living veteran of the foreign 
power had stated that never more could happen. Conse- 
quently people seldom try it. 

2. John now disported a fine new hat. I paid plenty for the 
food. When cooked asparagus has a delicious flavor sug- 
gesting apples. If anyone wants my wife or any other 
physicist would not believe my own eyes. I would believe 
my own word. 

3. That was a relief whenever you be let your mind go free 
who knows if that pork chop I took with my cup of tea 
after was quite good with the heat I couldn*t smell any- 
thing off it I T m sure that queer looking man in the 

4. In a few days was the minimum amount of money remaining to 
the end. However everyone knows the meaning implied. It 
was true when Cutler says that we should proceed care- 
fully. When you love yourself too much., The woman who 

5. Fourscore and twenty years passed before we could meet them 
that isn't already done should have been a good son is 
going fast according to the teacher of his ability. His 
intelligence sufficed for the time. This cannot change 

- 2 - 

6. Even the killing was atrociously perpretated by the 
cruelest treatment that a small boy jumped over the hedge 
and buried her. A grave fault of many approaches to the 
furthermost reaches of the state. Politics and business 
are becoming lost to the . 

7. It is an Italian ox mouth dish. The only thing in the 
room is worms. I am the director of the seminar. In an 
evolving hemisphere. C'est Monsieur Jardin. I am a 
patient. Oh my dear Plapsen, you are my dearest Klapsen. 

He took it with many other matters are more apparent if 
they think so. Is there a reason for supposing that 
most people don't. Nevertheless sex is absolutely neces- 
sary as though the electron diffraction camera plate up 
on the top surface of 

9. Fifteen years before the mast, he ever had eaten. Try 

it and see, I believe that whatever arises a fund has 

been accumulated sufficiently in the near future holds 

m« ™™ * * ■ • • ■ ... 

many surprises. No man can judge his actions by his wife 
Susie . 

10. I forget whether he went on and on. Finally he stipulated 
that this must stop immediately after this. The last time 
I saw him when she lived. It "happened one frosty look of 
trees waving gracefully against the wall. You never can 

11. When I bought my wife a long time ago. I knew that it 
wasn't faster when he didn't eat or drink a toast to 
John Doe, otherwise known as McMillan's theorem. 
Whatever the nature of Christ's teachings. Go far into 

12. McMillan's Theorem 

McMillan's theorem states that whenever electrons diffuse 
in vacua. Conversely impurities of a cathode. No sub- 
stitution of variables in the equation relating these 
quantities. Functions relating hypergeometric series 
with confluent terms converging to limits uniformly 
expanding rationally to represent any function. 

13 • House Cleaning 

First empty the furniture of the master bedroom and bath. 
Toilets are to be washed after polishing doorknobs the 
rest of the room. Washing windows semi-annually is to be 
taken by small aids such as husbands are prone to omit 

- 3 - 

14. Epiminondas 

Epiminondas was one who was powerful especially on land 
and sea. He was the leader of great fleet maneuvers and 
open sea battles against Pelopidas but had been struck on 
the head during the second Punic war because of the wreck 
of an armored frigate. 

15. Salaries 

Money isn't everything. However, we need considerably 
more incentive to produce efficiently. On the other hand 
too little and too late to suggest a raise v/ithout a reason 
for remuneration obviously less than they need although 
they really are extremely meager. 

16. Murder Story 

When I killed her I stabbed Claude between his powerful 
jaws clamped cruelly together. Screaming loudly despite 
fatal consequences in the struggle for life began ebbing 
as he coughed hallowly spitting blood from his ears. 
Burial seemed unnecessary since further division was 

The sources are: 3, from "Ulysses" by James Joyce, 
page 748; 7 and 14 are the conversation and writings of two 
schizophrenic patients (quoted from Bleuler, "A Textbook of 
Psychiatry"). All others constructed by statistical means. 

„_C, ..-£,. -SHANNON 

"J une 11, 1 948 

The Department of Defense 
Washington 25, D. C. 

Prepared by 





C. E. Shannon 
Bell Telephone Laboratories 
Murray Hill, N. T. 

1. Introduction . 

A general communication system is shown in Figure 3. An information source 
produces a message. This is encoded in a transmitter to produce a signal suitable for 
transmission over the channel. During transmission the signal may be perturbed by 
noise. The perturbed signal is decoded or demodulated at the receiver to recover, as 
well as possible, the original message. 

The situation is roughly analogous to a transportation system for transporting physical 
goods from one point to another. We can imagine, for example, a lumber mill producing 
lumber at an average rate of R cubic feet per second and a conveyor system capable of 
transporting C cubic feet per second. If R is greater than C the full output of the mill 
cannot possibly be carried on the conveyor. On the other hand, if R is less than or equal 
to C it may or may not be possible, depending on whether the lumber can be efficiently 
packed in the available space of the conveyer. However, if we allow ourselves to saw 
the lumber up into suitable sizes and shapes we can always approach 100 per cent effi- 
ciency in packing. In this case we must, of course, supply a carpenter shop at the other 
end of the conveyor to reassemble the lumber in its original form before passing it on 

If the analogy is sound we might hope to define two parameters R and C associated 
with an information source and a channel, respectively. R should measure, in some 
sense, how much information is produced per second by the source, and C the capacity 
of the channel when used in the most efficient manner for transmitting information. We 
would expect then that if R ^ C the full output of the source cannot be transmitted satis- 
factorily. If R ^ C it should be possible to transmit the output of the source by proper 
encoding and decoding at transmitter and receiver. It turns out that it is possible to 
define quantities R and C which measure these information rates and capacities and 
satisfy the desired relationships. We will attempt to show how this can be done without, 
however, giving mathematical proofs of the results. 1 

2. The Information Source . 

The first problem is that of clarifying the nature of "information" and finding a 
measure of the rate of production for an information source. 

Information involves basically the concept of "choice." An information source 
chooses one particular message from a set of possible messages. If there were only 

!For mathematical details, see Shannon, C.E., "A Mathematical Theory of Commu- 
nication," Bell System Technical Journal. July and October, 1948. See also Shannon, C .E . , 
"Communication in the Presence of Noise," Proceedings of the I.R.E . (Forthcoming). 

to the consumer. 


one possible message there would be no communication problem. The amount of informa- 
tion produced by a source must evidently be related to the range of choice available. 

The simplest possible choice is a choice from two equally likely possibilities, say 
or 1. We shall call the corresponding unit of information a binary digit or "bit." A 
relay or flip-flop circuit has two possible states and is capable of storing one bit of 

A device which chooses at random from or 1 making one choice each second is 
considered to be producing information at rate R of one bit per second. Such a source 
produces a "message" which is a random sequence of O's and l's. 

A choice from say. 32 equally likely possibilities can be considered as a series of five 
choices, each from two equally likely possibilities, and, therefore, should correspond to 
five bits. More generally, a choice from n equally likely possibilities represent log P 
n bits. £ 

Suppose now that the various possible choices have different probabilities of occur- 
rence, say pi, p2, p n . How much information is produced when a choice is made under 
these circumstances? One feels intuitively that less "choice" is involved in a device 
which chooses between and 1 with probabilities .01 and .99 than in one which chooses 
with equal probabilities. In the former case the result is almost sure to be 1. 

The following example shows that by proper encoding an average compression can be 
obtained by using the probabilities pi, P2, p n . Suppose there are four possible choices 
A, B, C, D with probabilities p A = 1/2, p B = 1/4, p c = 1/8, p D = 1/8. If we use a simple 
direct code into binary digits: 

A = 00 B = 01 C = 10 D = 11, 

we use two binary digits per letter. On the other hand, using the following code where 
more probable letters are given short codes and less probable letters longer codes, we 
obtain an average saving 

A=0 B = 10 C = 110 D - 111. 

This is a reversible code; the original text can be recovered from the encoded sequences 
as is readily verified. With this code we need, on the average, only 

(1/2 x 1 + 1/4 x 2 + 1/8 x 3 + 1/8 x 3) = 1 3/4 

binary digits per letter. We may say then that a choice with probabilities 1/2, 1/4, 1/8, 
1/8 corresponds to 1 3/4 bits of information. If an information source were producing 
a sequence of the letters A, B, C, D with these probabilities we could encode it into a 
sequence of binary digits in which 1 3/4 binary digits are used on the average for e?.ch 
letter of message. 

A general analysis of the situation shows that if the letters are chosen with probabili- 
ties p lf p2, p n then it is possible to encode into binary digits using 

H = - 2, Pi log 2 Pi 

binary digits per letter of message on the average, and there is no method of reversible 
encoding using less. This H then is the equivalent number of bits per letter, and, if the 
source produces n letters per second, R = nH is the rate of production in bits per second. 


In the case of English text the statistical structure is more involved. There are the 
mricms letter probabilities Pi , but, also, there are statistical influences between nearby 
totters For example, the letter T is more often followed by H than by any other letter 
a Qis almost invariably followed by U, etc. In such cases there is a more general formula 
i for calculating the equivalent number of bits per letter of message. Let pU, 3» ■ s)oe 
i Ibe probability in the language of the sequence of letters i, j s. Then we define G„ 



.V ; !i. 


p(i, j, s) log 2 p(i, i, .... s) 

where the sum is over-all sequences of letters which are just n letters long J^h which 

ouences Gi. Go G n> ... represents a series of approximations to the desired H which 

takes into account mofe and more of the statistical structure as we proceed along the 
sequence. The information per letter of message can be defined by the limiting value of 
the G's. 

H = Lim G 

— » oo 


It can be shown that H has the desired properties; namely, we can encode the messages 
from the source into binary digits using H binary digits per letter on the average, and no 
method of encoding uses less. 

For the English language H has been estimated at roughly 2 bits per letter, taking 
account only of the statistical structure out to about 6 or 8 letters. 

If the messages produced by the information source are continuous functions of time 
ta in speech or television transmission, the situation is much more involved and we will 
not discuss it in detail. It is still possible to assign a rate of production of information 
In bits per second to such a source, but the rate now depends on other considerations. 
With continuous functions as messages, exact reproduction is not generally required and 
the rate R depends on the amount and nature of the discrepancy which can be tolerated 
between the original and recovered messages. The tolerable discrepancy in turn is 
determined by the final destination of the messages. With speech, for example, the toler- 
able errors depend on the structure of the human ear and brain. 

Although the mathematical problems involved in defining the rate for a continuous 
source have been completely solved, it is in practical cases very difficult to estimate R. 
The following calculation may be of some interest, however. Suppose we are interested 
only in transmitting English speech (no music or other sounds), and the quality require- 
ments on reproduction are only that it be intelligible as to meaning. Personal accents, 
Inflections, etc., can be lost in the process of transmission. In such a case we could at 
least in principle, transmit by the following scheme. A device is constructed at the trans- 
mitter which prints the English text corresponding to the spoken words These can be ^ 
translated into binary digits in the ratio of about two binary digits per letter, or ^x4.D - v 
per word. Taking 100 words per minute as a reasonable talking speed we obtain 900 bits 
per minute or 15 bits per second as an estimate of the rate for English speech when in- 
telligibility is the only fidelity requirement. 

3. The Capacity of a Channel . 

We now consider the problem of defining the capacity C of a channel for transmitting 
Information. Since we have measured the rate of production for an information source in 


mitted over a given channel? 

in some cases the answer Is simple. With a . tele «»J%*£Z ^second, 

can send 5n bits per second. 

Suppose now that the channel is defined £ fc^j. JJ- ^ Vyclef pTrse^nfwide . 
tions of time f(t) which lie within a cer ^»^ a series of 

It is known that a function of thi^type can be J£j say that such a function 

equally spaced sampling points^ seconds apart Thus we may say 
has 2W degrees of freedom, or dimensions, per second. 

If there is no noise whatever » 

Even when there is noise, if we place no ^tjon s ^JgPSSS!SSU 
capacity will be infinite for we m **£W2?£tof e« p transmitter 
number of different amplitude levels .^^nw^etevres The capacity depends, of 


The shiest type o, noise is white V^tt'S^K''' 
distribution of ampUt^s is Ga**ta, and to a eetrnmr s ilat q 7 ^ tf 

into a unit resistance. 

The simplest limitation on transmitter power is ^^^S^£%M 
SLr«TL£T£K SLrto/eTarametLs W, P, and N, 
the capacity C can be calculated. It turns out to be 

C = W log 2 E -^ Ji (bits per second). 

P + N 

different amplitudes at each sample point. In a time T there will be 2TW independent 
samples. Thus, there are approx imately 

( / P + N ) 2TW (p + N)TW 
M " (V N ) = ( N ) 

different signal functions of duration T that can be distinguished from one another in spite 
of the noise. This corresponds to 


log 2 M = TW log 2 P ft N 

binary digits in the time T or 

C=W log 2 P^N 

binary digits per second. This formula has a much deeper and more precise signifi- 
cance than the above argument would indicate. In fact it can be shown that it is possible, 
by properly choosing our signal functions, to transmit W log 2 fo^ binary digits per 
second with as small a frequency of errors as desired. It is not possible to transmit 
binary digits at any higher rate with an arbitrarily small frequency of errors. This 
means that the capacity is a sharply defined quantity in spite of the noise. These state- 
ments are proved by two different methods. * 

The formula for C applies for all values of P/N. Even when P/N is very small, the 
average noise power being much greater than the average transmitter power, it is pos- 
sible to transmit binary digits at the rate W log 2 P N with as small a frequency of 
errors as desired. In this case log 2 (1 +£) is approximated by -£log 2 e = 1.443 ^ 
and we have approximately 

C = 1.443 

It should be emphasized that it is only possible to transmit at a rate C over a channel 
by properly encoding the information. In general, the rate C is only approached as a limit 
by using more and more complex encoding and longer and longer delays at both trans- 
mitter and receiver. In the white noise case the best encoding is such that the transmitted 
signals themselves have the structure of a white noise with power P. The difficulty with 
the approximate argument given for that case, and the reason it does not give a sharply 
defined capacity, is that the selection of signals is not optional. The distribution of ampli- 
tudes is not Gaussian as it should be. 

4. Comparison of Ideal and Practical Systems . * 

In Figure 4 the curve is the function 

% = log (1 + f ) 

plotted against P/N measured in db. It represents, therefore, the channel capacity per 
unit of band with white noise. The circle and points correspond to PCM and PPM systems 
used to send a sequence of binary digits and adjusted to give about one error in 1CP binary 
digits. In the PCM case the number adjacent to a point represents the number of ampli- 
tude levels - 3 for example is a ternary PCM system. In all cases positive and negative 
amplitudes are used. The PPM systems are quantized with a discrete set of possible 
positions for the pulse, the spacing is ^j, and the number adjacent to a point is the num- 
ber of possible positions for a pulse. 

The series of points follows a curve of the same shape as the ideal but displaced 
horizontally about 8 db. This means that with more involved encoding or modulation sys- 
tems a gain of 8 db. in power could be achieved over the system indicated. 

See Shannon, C. E., "Mathematical Theory of Communication" and "Communication 
in the Presence of Noise." 


Of course, as one attempts to approach the ideal, the transmitter and receiver re- 
quired become more complicated and the delays increase. For these reasons there will 
be some point where an economic balance is established between the various factors 
It may well be, however, that even at the present time more complex systems would be 

A curious fact illustrating the general misanthropic behaviour of Nature is that at 
both extremes of P/N (when we are well outside the practic* ^/^pcMlotaS 
in Figure 4 approach more cjosely the ideal curve. At very large P/N * e ,f £M pomts 
Approach to within 10 log 10 # = 4.5 db. of the ideal while with very small P/N the PPM 
points approach to within 3 db. The relation 

C = W log (1 

can be regarded as an exchange relation between the parameters W and P/N. Keeping the 
ch^el cgacity fixed we can'decrease the bandwidth W provided we ^ease P/N «£- 
ficiently. Conversely, an increase in band allows a lower signal-to-noise ratio in the 
channel The required P/N in db. is shown in Figure 5 as a function of the band W. It is 
assumed here that as we increase W, N increases proportionally: 

N = W N 

where N is the noise power per cycle of band. It will be noticed that if P/N is large a 
reduction of band is very expensive in power. Halving the band roughly doubles the 
signal-to-noise ratio in db. that is required. 

The channel capacity C can be calculated in many other cases. A general result that 
applies in any situation where the average transmitter power is limited to P is that the 
channel capacity is bounded by: 

WlogL^l^C £W log^ 

where N, is a parameter called the "entropy power" of the noise. It is defined as the 
power ina white noise having the same entropy as the actual noise. N is, as before, the 
average noise power. 




Nyquist, H. 

"Certain Factors Affecting Telegraph Speed,' 
Bell System Technical Journal, April 1924, 

Hartley, R. V. L. 

Shannon, C. E. 

Toller, W. G. 
Wiener, N. 

Bailey, R. D., and 
Singleton, H. E. 

p. 324. 

"Certain Topics in Telegraph Transmission 
Theory," A.I.E.E. Transcripts, Vol.47, 
April 1928, p. 617. 

"Transmission of Information," Bell System 
Technical Journal , July 1928, p. 535. 

"A Mathematical Theory of Communication," 
Bell System Technical Journal, July, 
October, 1948. 

"Communication in the Presence of Noise," 
Proceedings of the I.R.E . (Forthcoming). 

Sc.D. Thesis, Department of Electrical 
Engineering, Massachusetts Institute of 
Technology, 1948. 

The Interpolation, Extrapolation and Smoothing 
of Stationary Time Series, NDRC Report 
(Forthcoming as a book to be published by 
John Wiley and Sons, Inc., New York). 

Cybernetics . John Wiley and Sons, Inc., 
New York, 1948. 

"Reducing Transmission Bandwidth," Electronics. 
August 1948, p. 107. 



Note on Certain Transcendental Numbers 
Claude E. Shannon 

This note calls attention to a certain class of 
numbers that are easily shown to be transcendental but seem 
to have escaped previous notice. A typical example is the 

-2 * 

X = 2 * 

or more precisely X = ^Lim^X n , ^ n +l = 2 * ^0 = 2 * ^ is ^ 
easily seen that X exists and satisfies the equation X = 2" . 
It is known from a conjecture of Hilbert , proved by Gelfond 

and by Schneider, that a x is transcendental if a / 0, 1 is 
algebraic and x is an algebraic irrational. Nov; X is clearly 
not rational, and if we suppose it an algebraic irrational, 
it must then be transcendental, a contradiction. Hence it is 

More generally let f be a function such that if 
x is algebraic and does not belong to a set S, then f(x) is 
transcendental. Let g 1 and g 2 be algebraic functions and 

such that x f g 1 fg 2 x, xeS. Then the solutions of 

are transcendental by a similar argument , using the fact that 
g£ is algebraic. If the sequence X n = (g 1 fg 2 ) 1 X approaches 
a limit X it must be transcendental. Some functions known to 
have the property required for f are sin x, e x and J Q (x) , the 
exceptional set S consisting of the number 0. 


October 27, 1948 


Consider a di aerate channel with two poeeiMe symbols 
and 1* Hoise it aeeuaec to affect successive cyrbolB inde- 
pendently **nd in such 6 wty that t o probability of a syjabol 
bainf, inter, reted correctly at the receiver ie j> » * g 1 wnlealg 

the probability of incorrect interpretation io q - 

^ 2 

ca^city of such & channel is 

- e 2 

Ve e©»us» e very soall and epproximte log (1 ♦ c) by z 


* e 2 (natural units) 
In bits .or ayebel, the capacity 1st 

C - log*, a 

A vary eiaple coda can be oonetruct<*J for this eyatea 
to aond a Doquence of random binary dibits at nearly the rata C 
with a quite snail frequency of errors | In other wards a code 
Wuich la not far fron the ideal* The code is merely to repeat 
each binary digit in the oeeeage a large number n of tiasee. At 
the roceiver, a group of n is received, end the rajority report 
la taken aa the original nessags eynbol. 

If the m&mrp eynhol is then a f s are trans-itted. 
At tilt receiver the n received eynbols will be a -istur© of 
0*8 und l»a the number of 0*s present will be distributed ac- 
cording to a binonial distribution with p • I *, * and q ■ 

For large n the binonial distribution is approximately nornal 
(and this approximation is especially ^ood when p 5 s close to 

i). The exacted nc->*r of O'c is p n, and the standard devia- 
tion is; 

An error occu*e when the number of rocoivod O'o ie lose than 
l.e* when the actual number of cores is p n - § av*iy froo 
t;ie ejected nunber. In terras €>f r this iat 

*■ - ^ — ^ standard deviations. 

Hence the frequency of errors is given by the area of a noma! 
curve with otandard deviation equal to unity fron a out to m. 

To obtain a frequency of errors 10*3, say, we mist 
have a ■ 1*5 



and the rate is -JL. as coopered with the rate 1«.&5 the 


ideal (with essentially zero froquency of errors). 

Hovenber IS, 

c. s. svjjman 

December 6, 1943 

Note on Reversing A Discrete Markhoff Process 

In "A Mathematical Theory of Communication" a 
language was represented by a discrete Markhoff process with 
a finite number of possible states. Such a stochastic process 
can be represented schematically by means of an oriented linear 
graph as in Fig. 1 

Consider the question of generating the same language 
in reverse; for example, English but read backwards. Can we 
always invert a finite state Markhoff process and obtain a 
finite state Markhoff process? The answer is "yes" and further- 
more the corresponding linear graph has the same topology, but 
with reversed kwwl orientation on all branches. If the 
original process has,! probabilities /(probability when in state 
i of going to state j), then the reverse process has the same 
state probabilities and the transition probabilities given by: 

<yU) - g Hii) 


This is true since this qj(i) is merely the a posteriori probability 
for the original process that when in state j the preceding state 
was state i. The inverse of Fig. 1 is shown in Fig. 2. 

It is interesting to show directly that the entropy 
H £ of the reverse process is equal to the entrop4jHp of the 
forward process. Of course, this must be true a posteriori from 
the general properties of entropy. V/e have 

Pjfi'jU) - PifKj) 

9 ? 

- 2 - 

Hence t 

ZP^U) log Pjqj(i) - ZPifi(j) log Pl^i(j) 


2Pjqj(i) log qj(r) ♦ 2Pjqj(i) log ? ± 

- ZtjfiU) log ♦ ZPij^itj) log Pi 



-H R + ZPj log Pj —Hp ♦ ZPi log Pi 



Outline of Talk 
American Statistical Society, December 28, 1949 


C. S. Shannon 

Bell Telephone Laboratories, Inc., Murray Hill, R. J. 

1, Information Produced by a Stochastic Process 

In communication engineering , we are interested in 
transmitting messages from one point to another. The messages 
generally consist of a sequence of individual symbols, such as 
the letters of printed English, which are governed by proba- 
bilities. Thus, in English, there are the various letter fre- 
quencies, digram frequencies, etc. The "meaning* of the 
message (if any) is irrelevant to the engineering problem. 
Abstractly, then, we may consider a message to be a sequence of 
meaningless symbols produced by a suitable Stochastic process. 
Communication systems must be designed to handle the ensemble 
of possible messages; the particular one which will actually 
occur is not known when the system is constructed. The source 
producing messages is assumed to have only a finite number of 
possible internal states. 

2. Entropy as a Measure of -Information 

A suitable measure of the amount of Information pro- 
duced by a discrete Stochastic process is given by the entropy 
H, where 

Ha- Um hi p^, lo*2 ** x l» ••"» 

■ ™e> ^S» sw 

- 2 - 

in which x^, • Xjj is & sequence of N symbols produced by 

the process, p(x^ f •*#, x^) is the probability of this ssquence, 

and the sum is over all sequences of this length. 

The significance of the quantity H is that it is pos- 
sible to translate messages from a source with entropy H into a 
sequence of binary digits (0 or 1) using, on the average, H + c 
binary digits per letter of the original message with any 
positive c. It is not possible to translate so that fewer are 
used* Thus. B measures, in a sense, the equivalent number of 
binary digits per letter of message. It can be shown that H 
also determines the amount ef channel capacity required for 
transmission of the original messages. 

entropy, H x (y) , of one source relative to another. This 
measures in a sense the uncertainty per letter of the y sequence 
when the x sequence is known, or ths amount of additional infor- 
mation in the y sequence over that available in the x sequence. 
H x (y) can be defined as follows: 

Hjty) « H(x, y) - H(x) 

where H(x, y) is the entropy of the sequence whose elements are 

ths ordered pairs (x, y) • 

3. The Nature of Information 

While the entropy H measures the amount of information 
produced by a Stochastic process, it does not define the infor- 
mation itself. Thus two entirely difference sources might 

produce information at the same rata (same H) but certainly they 
are not producing the same information. If we translate the 
output of a particular source into a different "language" by a 
reversible operation, the translation may be said to have the 
same information as the original. Thus we are led to consider 
the information of a Stochastic process as that which is common 
to all translations obtained from the given process by members 
of the group of reversible translations, or, alternatively, as 
the equivalence class of all processes obtains* from the given 
one by such translations. To avoid certain paradoxical situa- 
tions, involving infinite internal storage in the transducer 
doing the translating, it is desirable to first limit the group 
Q to translations possible in transducers having a finite 
number of possible internal states. The information associated 
with a process may bs denoted by a single letter, say X. Thus 
X = T means that T can be obtained by a translation of I, and 
conversely. It is possible to set up a metric satisfying the 
usual postulates as follows: 

* 2H(x, y) - *(x) - H(y) . 

Vith this metric It Is possible to define limiting sequences of 
elements, each of which is an information. Thus s Cauchy 
sequence, X jL> Xj, i« defined by requiring that 

Lim ptX,, In) « . 

The Introduction of these sequences as new elements (analogous 
to irrational numb ere) completes the space in a satisfactory 
way and enables one to simplify the statement of various results. 
k. The Information Lattice 

A relation of inclusion, x > y, between two infor- 
mation elements x and y can be defined by 

x > 7 * H x (y) ■ . 

This essentially requires that y can be obtained by a suitable 
finite state operation (or limit of such operations) on x. If 
x > y we call y an abstraction of x. If x > y, y > s, then 
x > s. If x > y, then H(x) > H(y). Also x > y means x > y, 
x f y. The information element, one of whose translations is 
the process which always produces the same symbol, is the 
element, and x > for any x. 

The sum of two Information elements, s m x + y, is the 
process which produces the ordered pairs (x^, y n ). We have 

and there is no u < s with the properties; a is the least upper 
bound of x and y. 

The product s » xy is defined as the largest t such 
that • > x, s > yj that is, there is no u > s haying both x 
and y as abstractions. The product is unique. 

With these definition* information element e fona a 
metric lattice. The lattice it not distributive, nor even 
modular. A non-distributive example 1b x, y independent 
sequences of binary digits, with z the sequence obtained by- 
mod 2 addition of corresponding symbols in x and y. Then 

sy + 2x = + = 
i(x + y) ■ i / . 

The lattices are relatively complimented. There 
exists for x < y a ■ with 

s + x = y 

sx =* . 

The element s is not, in general, unique. 
5. The Delay Free Group 0^ 

The definition of equality for information based on 
the group allows x = y when y is, for example, s delayed 
version of x$ y B ■ x^. In some situations, when one must 
act on information at a certain time, a delay is not permis- 
sible. In such a case we may consider the more restricted 
group of instantaneously reversible translations. One may 
define inclusion, sum, product, etc., in an analogous way, and 
this also leads to a lattice but of mush greater complexity 
and with many different Invariants. 

Proof of an Integration Formula 

C. E. Shannon 

The integral 

sin 2 x 2 sin^ or 

has arisen in an acoustical problem. It has been evaluated for N = 1, 2, 3, 4 as 
equal to 

g N (a) = a N + 2 i — r- 1 sin 2 i a (2) 
(-1 ' 

by R. C. Jones, and he has conjectured that f N = g N for all a, Af. A general 
proof follows. 

From (1) we have 

. , . , „, . 1 f ° cos lNx-2 cos 2(W - 1)* + cos 2W - 2) x . 
A 2 *, -h ~ Tfn-1 + In -2 = ~ y J L ^T^ ^ 


d a2 , , , cos 2Ate - 2 cos 2flV - l)a + cos2(A^ - 2)a 

— AW»(«) y^ (3) 

Also from (2) 

Aiv = a + 2 

(-1 ' 

2 _ sin 2(AT - 1) a 

AN. AT ftV(a) N~^\ 

tit.N gsw = 2 cos 2(N - 1) a (4) 
The equality of (3) and (4) can be established by noting that the numerator of (3), 



cos 2 N a - 2 cos 2(N - l)a + cos 2(N - 2)a 
Re [ e JV,a - 2e J2{N ~ l)a + e/W-2)aj 


^-i)a[ c , 2 a_ 2 + c -,2 a ]J 
= Re |«W-D« (2; -)2 


- - Re |4 sin 2 a ^W- 1 )*) = - 4 sin 2 a cos 2(N - l)a 

but A 2 (0) = A 2 f N (0) = 0, so that 
^ 2 n,n8nM = Ai^/jvCot) 
also it has been verified that 

Si (°0 = /i(a) 
£2 (°0 = /2(a) 

Hence it follows in general that 

A &leit*l ****** »t fr^Mlttltac lafonttttoa 

2t Is p*«*lM* fey ¥fe*l*u# of eodulaUoe to Xmr 

pjroto oao tutpmt of e oystos for *jr&»o*iUia£ Iafor»*Uoa at too 
OXpoooo Of otters. Mi« T*risro« car.atmeo *tic* mj se exoasuigfg 

i, uaitty of rocoivo* oigoel, ftiiica ess bo rou^iJ/ 

SMMMHtrwS la *««HM» t>/ S&0 tO £13 1 00 



£• TtttiiBZi 2 1%9? yc**r»p. 

S. tlm of troossUooi£A» 

ft. BoiOO 4*4 t&O OJKfeOtt* 

aoooroX tteojr* of bow tfeooo voriofcioo oro roiotoO «*4 tSm 

liivwi»«d oafi will oe &«volopo4 la a forthoofclas soaorwifim. 
Bo»oo«r «poofcitt& x-.Ht*M/ *&4 oa&or « sus&ber of o ojJUioay 0001*09- - 

f ol2ooXm« e^ufitioos 

a ■ f if y 10 {*) 

3 * « aooouro Of 4ii*t0rtiGji at tftt **««tv*r 

t * *f trooonlooiaa 

* • bsaa iriiia ©f tro-ts&ittor 

ST * aciso j-«w«T £*30|t? fl ti:«t 1» t&O O&iOO ?OW*r 
p#r *Ait tw?.i4 oil Hi, *>*«&r*e» tolas 

alalia *s flfci is toe rofii«» u^At-? *fi>.:mlaar*tioa 

yj UUi ftmi tautt koojMtag rooolToft <|ooli*jr istojr&ottt 
oo aor 0100010 t, F «M £ 1a r*rio*»o o*> loo* ft* oo 
kooo t l* o gpam ©f t&« foooHoo* 

r 1 21 

«fcoro £«* an£ % or« too WUl triuioatttor tatar ao4 acl«o 
QJQjSg f, **ria« too traaftftlsalast tiao. ^» fcr •sa«pl« t/jr to- 
oroosiog btutf wUto oo ooo eoorofioo tra&o&ittor - tU« 

m&a&m&t 10 la «a« ooaoo vor* foooroolo »iae* It lit « log-« *moj o**lag aulto or boaA oJUitfc AlvMoo t&o o*or«r 
»jf a ft* tor. 

»ro two »*tbfld« of fetter Sag o1&ao1 *» aaloo rotlo «t too ox»«ooo 
of boo* «i*to. BoltOor of titooo Jkwovo* Is by oor msw* eftUud 
l& too ozobooso. Sfco $roooal aoKomotoa toooriooo o sow ootfaoo 
at its t&Uft oosootlollr too aoxtwai e*oias of olgool 

pmm* io oofelovoi for o $lm oo** wlata laero*oo* &U 4coo 
not «oo£ toot «t« ftfotoa of tro ao a i o o ieo lo • tooorotioaHf 
Uool ono for tkoro oro oororol otHor aooo* of iss$*miM* ro- 
ooivoi qooJLU* fcooola* f . *. ? *o& * flxoi - «**t tfclo oro too 
to to yWlt m ooarlr tAool oireonago roto ootooo* too 

anlM 1m Oaa^L fift Um of O OOlloo fcfa* YOl&OC 

of too lopot ytoolotlag fomoUoa (too o$oooa faootloo la tolo- 
saoao oaa roftle) ot o 00300000 of rofolorXr ooboo* oooylla t 

Thus t«8 + 4~£**l , 
Oi *5 --« 4-4-2 + 1 

A tnaaltttr for this ay* taa oould built 1m the 
following way. A oondenaar ia okarged as usual to tha eamplad 

roltage. fill roltaga la read on a comparator teiaaed up to 


half the *w<""t If the comparator glrea a poaitlra Indlcatioa 
am electronic switch la oloaad feeding a aegatire pulaa of 2* 
uuita oT charga late tha condenser; If not a poaitlra pulaa of 
2 m unita is fad in. Tha oomparator is now switched to control 

' - 

at now pulaa source whieh preduaas pulaaa of 2 n ** 1 units and tha 
prooaaa is repeated. Thus tha circuit f aods in positire or 
nogatlTO pulaaa of decreasing magnituda "hunting* for a balance. 
At oaoh stags a rooordar remembers whathor a poaitlra or negatire 
pulaa was used. Thass positire ant nagatira recordings actually 
arc tha Binary roprasantation of tha original roltaga, as ona 
can soo »y roading tha shore table with 1» roplaaod by 0. Baneo 
tha raoolror of Jig, 4 can ho used without alteration in this 

- £723 

Creative Thinking 


Up to 100% of the amount of ideas produced, useful good 
ideas produced by these signals, these are supposed to be arranged 
in order of increasing ability. At producing ideas, we find a 
curve something like this. Consider the number of curves produced 
here - going up to enormous height here, 

A very small percentage of the population produces the 
greatest proportion of the important ideas. This is akin to an 
idea presented by an English mathematician, Turig, that the human 
brain is something like a piece of uranium. The human brain, if 
it is below the critical lap and you shoot one neutron into it, 
additional more would be produced by impact. It leads to an ex- 
tremely explosive • of the issue, increase the size of 
the uranium. Turig says this is something like ideas in the human 
brain. There are some people if you shoot one idea into the brain, 
* you will get a half an idea out. There are other people who are 
beyond this point at which they produce two ideas for each idea 
sent in. Those are the people beyond the knee of the curve. I 
don't want to sound egotistical here, I don't think that I am 
beyond the knee of this curve and I don't know anyone who is. I 
do know some peopie that were. I think, for example, that anyone 
will agree that Isaac Newton would be well on the top of this 
curve. When you think that at the age of 25 he had produced enough 


science, physics and mathematics to make 10 or 20 men famous - he 
produced binomial theorem, differential and integral calculus, laws 
of gravitation, laws of motion, decomposition of white light, and 
so on. Now what is it that shoots one up to this 

- 2 - 

part of the curve? What are the basic requirements? I think we 
could set down three things that are fairly necessary for scien- 
tific research or for any sort of inventing or mathematics or 
physics or anything along that line. I don't think a person can 
get along without any one of these three. 

The first one is obvious - training and experience, 
lou don't expect a lawyer, however bright he may be, to give you 
a new theory of physics these days or mathematics or engineering. 

The second thing is a certain amount of intelligence or 
you have 

talent. In other words, /to have an IQ that is fairly high to do 
good research work. I don't think that there is any good engineer 
or scientist that can get along on an IQ of 100, which is the 
average for human beings. In other words, he has to have an IQ 
higher than that. Everyone in this room is considerably above 
that. This, we might say, is a matter of environment; intelligence 
ie a matter of heredity. 

Those two I don't think are sufficient. I think there is 
a third constituent here, a third component which is the one that 
makes an Einstein or an Isaac Newton. For want of a better word, 
we will call it motivation. In other words, you have to have some 
kind of a drive, some kind of a desire to find out the answer, a 
desire to find out what makes things tick. If you don't have that, 
you may have all the training and intelligence in the world, you 
don't have questions and you won't just find answers. This is a 
hard thing to put your finger on. It is a matter of temperament 

3 - 

probably; that is, a matter of probably early training, early child- 
hood experiences, whether you will motivate in the direction of scien- 
tific research. I think that at a superficial level, it is blended 
use of several things. This is not any attempt at a deep analysis at 
all, but my feeling is that a good scientist has a great deal of what 
we can call curiosity. I won't go any deeper into it than that. He 

wants to know the answers. He's just curious how things tick and he 


wants to know the answers to questions; and if/sees things, he wants 
to raise questions and he wants to know the answers to those 

Then there's the idea of dissatisfaction. By this I don't 
mean a pessimistic dissatisfaction of the world - we don't like the 
way things are - I mean a constructive dissatisfaction. The idea 
could be expressed in the words, "This is OK, but I think things could 
be done better. I think there is a neater way to do this. I think 
things could be improved a little. w In other words, there is con- 
tinually a slight irritation when things don't look quite right} and 
I think that dissatisfaction in present days is a key driving force 
in good scientists. 

And another thing I'd put down here is the pleasure in see- 
ing net results or methods of arriving at results needed, designs of 
engineers, equipment, and so on. I get a big bang myself out of proving 
a theorem. If I've been trying to prove a mathematical theorem for 
a week or so and I finally find the solution, I get a big bang out of 
it. And I get a big kick out of seeing a clever way of doing some 

engineering problem, a clever design for a circuit which uses a very 
small amount of equipment and gets apparently a great deal of result 
out of it. I think so far as motivation is concerned, it is maybe a 

little like Fats Waller said about swing music - either you got it or 


you ain't. If you ain't got it, you probably shouldn't be doing re- 
search work if you don't want to know that kind of answer. Although 
people without this kind of motivation might be very successful in 
other fields, the research man should probably have an extremely 
strong drive to want to find out the answers, so strong a drive that 
he doesn't care whether it is 5 o'clock - he is willing to work all 
night to find out the answers and all weekend if necessary. Well 
now, this is all well and good, but supposing a person has these 
three properties to a sufficient extent to be useful, are there any 
tricks, any gimmicks that he can apply to thinking that will actually 
aid in creative work, in getting the answers in research work, in gen- 
eral, in finding answers to problems? I think there are, and I think 
they can be catalogued to a certain extent. You can make quite a list 
of them and I think they would be very useful if one did that, so I 
am going to give a few of them which I have thought up or which peo- 
ple have suggested to me. And I think if one consciously applied 
these to various problems you had to solve, in many cases you'd find 
solutions quicker than you would normally or in cases where you might 
not find it at all. I think that good research workers apply these 
things unconsciously; that is, they do these things automatically 
and if they were brought forth into the conscious thinking that here's 

a situation where I would try this method of approach that would 
probably get there faster, although I can't document this state- 

The first one that I might speak of is the idea of sim- 
plification. Suppose that you are given a problem to solve, I don't 
care what kind of a problem - a machine to design, or a physical 
theory to develop, or a mathematical theorem to prove, or some- 
thing of that kind - probably a very powerful approach to this 
is to attempt to eliminate everything from the problem except the 
essentials; that is, cut it down to size. Almost every problem 
that you come across is befuddled with all kinds of extraneous 
data of one sort or another; and if you can bring this problem 
down into the main issues, you can see more clearly what you're 
trying to do and perhaps find a solution. Now, in so doing, you 
may have stripped away the problem that you're after. You may have 
simplified it to a point that it doesn't even resemble the problem 
that you started with; but very often if you can solve this simple 
problem, you can add refinements to the solution of this until you 
get back to the solution of the one you started with. 

A very similar device is seeking similar known problems, 

I think I could illustrate this schematically in this way. Tou 

T s 
have a problem here and there is a solution which you do not know 

yet perhaps over here. If you have experience in the field repre- 
sented, that you are working in, you may perhaps know of a somewhat 
similar problem, call it P' , which has already been solved and 

which has a solution, S'. All you need to do - all you may have 
to do is to find the analogy from P' here to P and the same analogy 
from S' to S in order to get back to the solution of the given prob- 
lem. This is the reason why experience in a field is so important 
that if you are experienced in a field, you will know thousands of 
problems that have been solved. Tour mental matrix will be filled 
with P's and S's unconnected here and you can find one which is 
tolerably close to the P that you are trying to solve and go over 
to the corresponding S' in order to go back to the S you're after. 
It seems to be much easier to make two small jumps than the one big 
jump in any kind of mental thinking. 

Another approach for a given problem is to try to restate 
it in just as many different forms as you can. Change the words. 
Change the viewpoint. Look at it from every possible angle. After 
you've done that, you can try to look at it from several angles at 
the same time and perhaps you can get an insight into the real basic 
issues of the problem, so that you can correlate the important fac- 
tors and come out with the solution. It's difficult really to do 
this, but it is important that you do. If you don't, it is very 
easy to get into ruts of mental thinking. Tou start with a problem 
here and you go around a circle here and if you could only get over 
to this point, perhaps you would see your way clear; but you can't 
break loose from certain mental blocks which are holding you in 
certain ways of looking at a problem. That is the reason why very 
frequently someone who is quite green to a problem will sometimes 

come in and look at it and find the solution like that, while you 
have been laboring for months over it. You've got set into some 
ruts here of mental thinking and someone else comes in and sees it 
from a fresh viewpoint. 

Another mental gimmick for aid in research work, I think, 
is the idea of generalization. This is very powerful in mathemati- 
cal research. The typical mathematical theory developed in the fol- 
lowing way to prove a very isolated, special result, particular theo- 
rem - someone always will come along and start generalizing it. He 
will leave it where it was in two dimensions before he will do it in 
N dimensions! or if it was in some kind of algebra, he will work in 
a general algebraic field; if it was in the field of real numbers, he 
will change it to a general algebraic field or something of that sort. 
This is actually quite easy to do if you only remember to do it. If 
the minute you've found an answer to something, the next thing to do 
is to ask yourself if you can generalize this any more - can I make 
the same, make a broader statement which includes more - there, I 
think, in terms of engineering, the same thing should be kept in mind. 
As you see, if somebody comes along with a clever way of doing some- 
thing, one should ask oneself "Can I apply the same principle in 
more general ways? Can I use this same clever idea represented here 
to solve a larger class of problems? Is there any place else that 
I can use this particular thing?" 

Next one I might mention is the idea of structural analysis 
of a problem. Supposing you have your problem here and a solution 

- 6 - 

here. You may have too big a jump to take. What you can try to 
do is to break down that jump into a large number of small jumps. 
If this were a set of mathematical axioms and this were a theorem 
or conclusion that you were trying to prove, it might be too much 
for me to try to prove this thing in one fell swoopo But perhaps 
I can visualize a number of subsidiary theorems or propositions 
such that if I could prove those, in turn I would eventually arrive 
at this solution. In other words, I set up some path through this 
domain with a set of subsidiary solutions, 1, 2, 3» 4, and so on, 
and attempt to prove this on the basis of that and then this on the 
basis of these which I have proved until eventually I arrive at the 
path S. Many proofs in mathematics have been actually found by 
extremely roundabout processes. A man starts to prove this theorem 
and he finds that he wanders all over the map. He starts off and 
proves a good many results which don't seem to be leading anywhere 
and then eventually ends up by the back door on the solution of the 
given problem} and very often when that's done, when you've found 
your solution, it may be very easy to simplify; that is, to see at 
one stage that you may have short-cutted across here and you could 
see that you might have short-cutted across there. The same thing 
is true in design work. If you can design a way of doing something 
which is obviously clumsy and cumbersome, uses too much equipment; 
but after you've really got something you can get a grip on, some- 
thing you can hang on to, you can start cutting out components and 
seeing some parts were really superfluous. Tou really didn't need 
them in the first place. 

9 - 

Now one other thing I would like to bring out which I 
run across quite frequently in mathematical work is the idea of 
inversion of the problem. You are trying to obtain the solution 
S on the basis of the premises P and then you can»t do it. Well, 
turn the problem over supposing that S were the given proposition, 
the given axioms, or the given numbers in the problem and what you 
are trying to obtain is P. Just imagine that that were the case. 


Then you will find that it is relatively easy to solve the problem 
in that direction. Tou find a fairly direct route. If so, it's 
often possible to invert it in small batches. In other words, you've 
got a path marked out here - there you got relays you sent this way. 
You can see how to invert these things in small stages and perhaps 
three or four only difficult steps in the proof. 

Now I think the same thing can happen in design work. 
Sometimes I have had the experience of designing computing machines 
of various sorts in which I wanted to compute certain numbers out of 
certain given quantities. This happened to be a machine that played 
the game of nim and it turned out that it seemed to be quite diffi- 
cult. It took quite a number of relays to do this particular calcu- 
lation although it could be done. But then I got the idea that if 
I inverted the problem, it would have been very easy to do - if the 
given and required results had been interchanged; and that idea led 
to a way of doing it which was far simpler than the first design. 
The way of doing it was doing it by feedback; that is, you start with 
the required result and run it back until - run it through its value 




until it matches the given input. So the machine itself was worked 
backward putting range S over the numbers until it had the number 
that you actually had and, at that point, until it reached the num- 
ber such that P shows you the correct way. Well, now the solution 
for this philosophy which is probably very boring to most of you. 
I*d like now to show you this machine which I brought along and go 
into one or two of the problems which were connected with the design 
of that because I think they illustrate some of these things I've been 
talking about. 

In order to see this, you 1 11 have to come up around it; so, 
I wonder whether you will all come up around the table now. 

Bell Telephone Laboratories 

Cover Sheet for Technical Memorandum 

subject The Relay Circuit Analyzer - Case 22103 



i - Patent Dept. (2) 

2- R0 Bown 

3 - Wo Ho Doherty 

4 - Ho Ho Abbott 

5- A0 0. Adam 

6 -Ao E, Anderson 

7 -Eo Go Andrews 

8 ~ Mo Mo Atalla 

9 - Ho Wo Bode 

10 - Co Breen 

11 = Co Eo Brooks 

12 - Eo Bruce 

13 - Ao Burkett 

14 = Ao Jo Busch 

15 - Ro Lo Carmichael 

16 - Ao Bo Clark 

17 - Co Clos 

18 - Ro Co Davis 

19 - Jo Wo Dehn 

20 - To Co Dimond 

21 - Ko So Dunlap 

22 - F. So Entz 

23 - Jo Ho Felker 

24 - Jo Go Ferguson 

25 - Eo Bo Ferrell 

26 - Go Eo Fessler 

27 -Wo 0o Fleckenstein 

28 - Jo Bo Fisk 

29 - Go Ro .Frost 

30 - To Co Fry 

31 -Eo No Gilbert 

32 - Go Wo Gilman 

33 -Ko Goldschmidt 

34 -Ro Eo Hersey 

35 - Bo D„ Holbrook 
36 -Ao Wo Horton s Jr 6 
37 - Lo Wo Hussey 

33 -P. Husta 

39 - Ao Eo Joel, Jr„ 

40 - Mo Karnaugh 

mm- 53=1800=17 
date March 31, 1953 
author Co Eo Shannon 
Eo Fo Moore 


Switching Theory 

41=Ao Co Keller 
42=Wo Keister 

43 - Go Vo King 

44- Fo Ao Korn 

45- Wo Jo Laggy 
46=Co Yo Lee 
47=Eo Co Lee 
4S=Wo Do Lewis 
49-Co Ao Lovell 
50=Fo Ko Low 

51- Ao Ao Lundstrom 

52- Mo Eo Malonev 

53- C. Ho McCandless 

54- Bo McKim 
55=Bo McMillan 
56-Bo McWhan 
57=G Ho Mealy 




65- No 

66- G. 
70= Co 

72- R. 

73- Ho 

74- Co 

75- H. 

76- Bo 
6*0=R o 
S3- Jo 
64- So 
6S=X o 

Go Miller 
Fo Moore 
Jo Murphy 
Bo Myers 
Do Newby 
Ao Pullis 
To Rea 
Eo Ritchie 
Wo Roberts 
P o Runyon 
Mo Ryder 
No Seckler 
Eo Shannon 
So Shapiro 
F. Shipley 
Jo Singer 
Jo Stacy 
Eo Staehler 
Eo Sumner 
Wo Tatum 
Go Tryon 
H„ Washburn 
Fo Watson 
Go Wilson 
Lo Wright 

(See next page for Abstract) 

MM- 52 -1400-9 
M- 53 -1300-17 
March 31, 1953 


This memorandum describes a machine (made of 
relays, selector switches, gas diodes, and germanium diodes) 
for analyzing several properties of any combinational relay 
circuit which uses four relays or fewer. 

This machine, called the relay circuit analyzer, 
contains an array of switches on which the specifications 
that the circuit is expected to satisfy can be indicated, as 
well as a plugboard on which the relay circuit to be analyzed 
can be set up. 

The analyzer can (l) verify whether the circuit 
satisfies the specifications, (2) make certain kinds of 
attempts to reduce the number of contacts used, and also 
UJ perform rigorous mathematical proofs which give lower 
bounds for the numbers and types of contacts required to 
satisfy given specifications. 

The Relay Circuit Analyzer - Case 22103 

MM- 53 -11-00-9 
March 31, 1953 


1. Introduction 

Some operations which assist in the design of relay 
circuits or other types of switching circuits can be described 
in very simple form, and machines can be constructed which per- 
form them more quickly and more accurately than a human being 
can. It seems possible that machines of this type will be use- 
ful to those whose work involves the design of such circuits. 
This is the first of two memoranda describing particular mach- 
ines of this kind which have been built. 

The present machine, called the relay circuit 
analyzer, is intended for use in connection with the design of 
two terminal circuits made up of contacts on at most four relays 

The principles upon which this machine are based are 
not limited to two terminal networks or to four relays, although 
an enlarged machine would require more time to operate. Each 
addition of one relay to the circuits considered would approxi- 
mately double the size of the machine and quadruple the length 
of time required for its operation. 

; This type of machine is not applicable to sequential 
circuits, however, so it will be of use only in connection with 
parts of the relay circuits which contain contacts, but no relay 

C011S a 

2. Operation of the Machine 

The machine, as can be seen from Photograph 196492, 
contains sixteen 3-position switches, which are used to specify 
the requirements of the circuit. One switch corresponds to each 
of the 2^*16 states in which the four relays can be put. Switch 
No. 2 in the upper righthand corner, for instance, is labeled 
W + X + Y» + Z, which corresponds to the state of the circuit 
in which the relays labeled W, X, and Z are operated, and the 
relay labeled Y is released. 

The three positions of this switch correspond to the 
requirements which can be imposed on the condition of the cir- 
cuit when the relays are in the corresponding state. Since any- 
single relay contact circuit assumes only one of two values 
(open or closed), the inclusion of a third value (doesn't matter, 
don't care, or vacuous, as it has been called by various per- 
sons) merits some explanation. If the machine, of which the 
relay circuit being designed is to be a part, only permits these 
relays to take on a fraction of the 2 n combinations of which n 
relays are capable, then (except when considering what the mach- 
ine will do in case of relay failures) any circuits which agree 
on the combinations actually assumed will be equivalent in their 
properties. Since the class of circuits which agree with what 
is wanted just in the necessary combinations is larger than the 
class of those which agree in all combinations, the former 
class can and frequently will contain members using fewer con- 
tacts. Hence the switch corresponding to each state is put 
into the don't care position if the circuit will never assume 
that state, or if for any other reason the behavior when in 
that state is immaterial. The sixteen 3-position switches thus 
permit the user not only to require the circuit under consid- 
eration to have exactly some particular hindrance function, but 
also allow the machine more freedom in the cases where the cir- 
cuit need not be specified completely. 

In order to make a machine of this type to deal 
with n relays, (this particular machine was made for the case 
n - 4) 2 n such switches would be required, corresponding to 
the 2 n states n relays can assume. In each of these states 
the circuit can be either open or closed, so there are 22* 1 
functionally distinct circuits. But since each switch has 
3 positions, there are 3 2 distinct circuit requirements spec- 
ifiable on the switches, which in the case n = 4 amounts to 
43,046,721. Thus, the number of problems which the analyzer 
must deal with is quite large, even in the case of only four 

The left half of the front panel of the machine (See 
Photograph No. 196492) is a plugboard on which the circuit be- 
ing analyzed can be represented. There are three transfers 
from each of the four relays, W, X, Y, and Z brought out to 
jacks on this panel, and two plugs representing the terminals 
of the network are at the top and bottom. Using these, as 
well as some patch cords, it is possible to plug up any cir- 
cuit using at most three transfers on each of the four relays. 
This number of contacts is sufficient to give a circuit repre- 
senting any switching function of four variables. 

nn + ha „. If the specifications for the circuit have been put 
on th« sixteen switches, and if the circuit has been put on 

oplratef ^ ' ^ CirCUit anal ^ er is then ready to 

care ^t^il^ t ^ 6 co ^ tro1 switch and the evaluate -com- 

pare switch both m the evaluate position, pressing the start 
button will cause the analyzer to evaluate the circuit plugged 
Ii^Ia k* ?° i ndlcate in which of the states the circuit is 
closed by lighting up the corresponding indicator lamps. 

nrtC1 .. . Turning the evaluate-compare switch to compare 

^tll° n ^l h fu analyzer then checks whether the cir cuit di s- 
tfZttJUZ ? the requirements given on the switches. A dis- 
?hl 1 indicated by lighting the lamp corresponding to 

actual Mr^? UeStion ' - If t Switch is set for cl0 ^ ed a " d the 
actual circuit is open m that state, or vice versa a dis- 
agreement is indicated, but no disagreement is ever 'registered 

S^SS? eJSdJ&E the ^ 

to the short test position and the start button is pressed again 

clrcSS^d^T de J enBiB S 8 Whether any of contaclfin this ' 
sa?iafVin^2o haVe ^ 6en shorted ou t, with the circuit still 
bestdf 7 ^! th V e 5 ulr ements. The machine indicates on the lamps 
beside the contacts which ones have this property. 

ever need tht «! a SUrpr i sing to the reader than anyone would 
rlniVkl the assistance of a machine to find a contact which 

is certlin?v r tru e °^ £ th ? Ut affe ? ting «»■ circuit, Wni?e t£is 
eulf! ™5r LS r S ° f s i m P le examples, in more complicated cir- 
ticSLSv \ f ed iJ2 dant elements are often far . from obvious, pa?- 
in S« iLif th6re Sre Some states for which the switches are 
in the don't care position, since the simplified circuit mav be 

onff f 8 ° nly u n tlie do " t care state. It is often quite diffi- 
cult to see the simplification in these cases. 

in„ fln3 i P 6 ana iy?! r is als o helpful in case the circuit be- 
tn g i-^- yZ6d l S abrid P> because of the complications involved 
P?^2 e i n f ° Ut a11 paths , in the bridge ' The^circuit shown in 
iJf???M.n S T an / Xampl ! ° f a , circui t which was not known to be 
inefficiently designed until put on the analyzer. It determined 
in less than two minutes (including the time^required to pW 
not S 1,0 ? 1 ! 1 ?* the P^osird) that one of the contacts shown 
can be shorted out. How likely would a human being be to solve 
this same problem in the same length of time? 


. After the short test has been performed, putting 
the^main control switch in the open test position permits the 
analyzer to perform another analogous test, this time open- 
ing the contacts one at a time. 

These two particular types of circuit changes were 
chosen because they are easy to carry out, and whenever suc- 
cess! ul, either one reduces the number of contacts required, 
inere are other types of circuit simplification which it might 
be desirable to have a machine perform, including various 
rearrangements of the circuit. These would have required 
more time as well as more equipment to perform, but would 
probably have caused the machine to be more frequently suc- 
cessful in simplifying the circuit. Using such techniques, 
it might be possible to build a machine which could design 
circuits efficiently starting from basic principles, perhaps 
by starting with a complete Boolean expansion for the desired 
function and simplifying it step by step. Such a machine 
would be rather slow (unless it were built to operate at 
electronic speeds, and perhaps even in this case), and not 
enough planning has been done to know whether such a machine 
is practically feasible, but the fact that such a machine is 
theoretically possible is certainly of interest, whether any- 
one builds one or not. 

Another question of theoretical interest is whether 
a logical machine could be built which could design an im- 
proved version of itself, or perhaps build some machine whose 
over-all purpose was more complicated than its own. There 
seems to be no logical contradiction involved in such a mach- 
ine, although it will require great advances in the general 

undertaken aUt ° mata before any such P ro J* ect °o uld ^ confidently 


To return to the relay circuit analyzer, a final 
operation which it performs is done with the main control 
switch in the prove position. Pressing the start button and 
moving the other 4-position switch successively through the 
W, X. Y, and Z positions, then certain of the eight lamps 
W, W[ , X, X', Y , I*-, Z, Z« will light up. The analyzer has 
carried out a proof as to which kinds of contacts are required 
to synthesize the function using the method of reduction to 
functions of one variable, which will be explained in a forth- 
coming memorandum. The analyzer here ignores whatever circuit 
has been plugged in the plugboard, and considers only the func- 
tion specified by the sixteen 3-position switches. If every 
circuit which satisfies these specifications requires a back 
contact on the W relay, the W» light will go on, etc. 

- 5 - 

If, for instance, seven of the eight lights are on, 
any circuit for the function requires at least seven contacts, 
and if there is in fact a circuit which uses just seven, the 
machine has, in effect, given a complete proof that this cir- 
cuit is minimal. Circuits for which the machine can give such 
a complete proof are fairly common, although there are also 
circuits (which can be shown to be minimal by more subtle me- 
thods of proof) which this machine could not prove minimal. 
An example is the circuit of Figure 1. This can be simpli- 
fied by the analyzer to a circuit of nine contacts, but in 
the prove position the analyzer merely indicates that at least 
eight contacts are necessary. It can be shown by other meth-i 
ods that the 9-contact circuit is minimal. But at any rate, 
the analyzer always gives a mathematically rigorous lower 
bound for the number of contacts. 

3» The Circuit and Operation of the Relay Circuit Analyzer 

A complete circuit diagram of the analyzer is shown 
in Figures 2 and 3. The circuit, as already mentioned, has 
five modes of operation; 1. evaluating a circuit, 2. com- 
paring a circuit with desired characteristics, 3. examining 
a circuit for contacts that can be shorted without affecting 
operation, 4. examining for contacts that can be opened with- 
out affecting operation, and 5. proving that certain con- 
tacts are necessary in any realization of the function. The 
method of operation of the circuit will be described in turn 
for each of these five modes of behavior. 

4. Evaluation of a Circuit 


In this mode of operation the machine goes through 
in sequence the sixteen possible states of the relays W, X, Y 
and Z, that are involved in the circuit and tests in each state 
whether or not the circuit is closed. If it is closed, the 
corresponding panel light is lit. In this process only the 
right-hand part of the circuit in Figure 2 is involved and 
switches SIS and S19 are both in the evaluate position. The 
selector switch S17 goes through one complete revolution to 
make this test. During this revolution the four relays W, X, 
Y, and Z proceed sequentially through their sixteen states. 
This sequence is produced by the first two wipers and decks 
of the selector switch S17. At the first position (0000) 
all four relays are unoperated. At the second step (0001), 
ground on the second wiper operates relay Z, which locks in 
on its own front contact. The circuit is then set to test 
the situation where W, X and Y are unoperated and Z is oper- 
ated. At the third step relay Y is operated and locks in on 

- 6 - 

its own front contact. At the fourth step Z is short-circuited 
by the wiper of the first deck. This releases Z and produces 
the state 0010. Proceeding in this manner it will be seen that 
the four relays W, X, Y and Z go through the sixteen states 
indicated. The circuit which is being tested may be thought 
of as being connected between plugs PI and P2 at the upper 
left of the diagram. This network consists of contacts on 
the four relays W, X, Y and Z. Actually some other contacts 
are involved in the network between PI and P2 (contacts on 
the H relays) but in the present mode of operation these H 
relays do not operate and do not affect the hindrance from 
PI to P2. For a given state of the relays W, X, Y and Z the 
plugs PI and P2 will be connected together if, and only if, 
the circuit being tested is closed for that state of the re- 
lays. The relay G will, therefore, operate if, and only if, 
the circuit is closed in the state in question. If it is 
closed, a ground will be applied to the third wiper of the 
selector switch S17 and this will fire the corresponding 
neon lamp. If it is not closed +34 volts will be applied 
to the lamp extinguishing it (if it is already fired). The 
voltage across the lamp circuit, 64-24 or about 60 volts, 
lies between the fire and sustain voltages for the neon 
lamps. Consequently, if they are fired they will remain 
fired, if extinguished they will remain out. Thus the lamps 
remain in the state produced by the evaluation of the cir- 
cuit even after the wiper has left the point in question. 

The movement of the stepping switch is produced by 
a three-stage buzzer circuit consisting of relays U, V and P. 
In the buzzing condition the parallel S» and T» combination 
in series with U will be closed. The operation of U ener- 
gizes V through the front U contact in series with the V 
coil. The operation of V then operates P in a similar manner. 
The operation of P releases U through the P' contact. This 
releases V which releases P. etc. 

At the start of an evaluation, switch SIS will be 
in the evaluate position, switch S19 in the evaluate position, 
selector switch S17 at position 22 (and relay S, therefore, 
operated) and selector switch S16 at position 21 (with relay T, 
therefore, operated). When the starting push button S20 is 
pressed magnet Ml of stepping switch 1 is energized. When 
S20 is released Ml releases and the stepping switch moves to 
position one. This releases relay S and the three-stage 
buzzer U, V, P starts operating. At each cycle of this buz- 
zer the coil of selector switch S17 is energized and released 
by a make contact on the P relay. This sequences the relays 
W, X, Y and Z through their sixteen states , as already des- 
cribed, and indicates on the neon lamps the states for which 
the circuit being tested is closed. When the wipers reach 
level 22 relay S operates, stopping the buzzer and ending the 

- 7 - 

5 . The Comparison Mode of Operation 

In this mode of operation the circuit set up on the 
plugboard is to be compared with the settings of the sixteen 
three-position switches. If in any state the circuit disagrees 
with the switch setting the corresponding neon lamp will light 
up. For this test switch S18 is set in the evaluate position 
and switch S19 in the compare position. When the starting push 
button S20 is pressed, the buzzing circuit U, V, P starts as 
before, cycling the selector switch S17 through one complete 
revolution. The four relays, as before, go through their six- 
teen possible states and the relay G, as before .operates or 
not, depending on whether the circuit being tested is closed 
or not. The lamps, however, are no longer controlled directly 
by the relay G, but instead by contacts on the relay A. The 
relay A is connected to operate, if, and only if, the circuit 
condition of the network being tested (open or closed) dis- 
agrees with the setting of the corresponding three-position 
switch. This result is obtained by having one end of the coil 
2f,,? elay A connected (via the fourth wiper of selector switch 
S17J to +24 volts, nothing (i.e. floating) or minus, according 
to the desired behavior of the circuit in the state in question 
is open, "don't care", or closed (as represented by the setting 
of the three-position switch). The other end of the relay A 
is connected to +24 volts or minus, according as the actual 
circuit under test is open or closed (this being carried out 
by a transfer on the G relay). The relay A will operate only 
if the two ends of the coil receive different polarities, and 
this will occur only if the switch setting differs from the 
state of the network under test as indicated by the state of 
the relay G. If such a disagreement occurs the corresponding 
lamp is fired by a ground coming in the third wiper of selec- 
tor switch S17. 

The starting and stopping are carried out by the 
same means as used in the evaluate mode. 

6. The Short Test 

In testing for contacts in the circuit that can be 
shorted, the sequencing is somewhat more involved. Roughly 
speaking, the various contacts used in the circuit are short- 
circuited one-by-one, and for each contact the circuit goes 
through a sequence similar to the comparing mode of behavior 
just described (comparing the circuit when this contact is 
shorted with the desired characteristics set up on the three- 
position switches). If any disagreement is found, the neon 
lamp associated with the contact in .question is fired, indi- 
cating that this contact is necessary in the circuit and cannot 

- 8 - 

be shorted. Actually, the sequence is a bit more complicated 
since to save time and equipment the tests on the make and 
break parts of a transfer in the circuit being tested are 

To carry out the short test switch S16 is put in 
the short position (the position of S19 is irrelevant). The 
selector switches S16 and S17 start in positions 21 and 22 
respectively, so that relays 3 and T are both operated. When 
the starting button S20 is pressed, the magnets of both S16 
and S17 are energized and when S20 is released they step 
ahead one step releasing both S and T and allowing the buzzer 
circuit to start. The first step of selector switch S16 
causes E to operate. This removes the voltage from the in- 
dicating lamps L16 to L39 (removing any indication on these 
lamps from previous runs). Stepper 1 then proceeds through 
a complete revolution. At step 17 the second wiper applies 
a voltage to the coil of Sl6, pulsing S16 ahead one notch. 
This releases E, and reapplies voltage to the indicating 
lamps Lib to L39. The wipers of selector switch S16 are now 
connected to position 1 (the top row) of this selector. The 
sixth wiper operates relay HI which disconnects the first W 
transfer from the circuit being tested. The three points in 
the circuit being tested that were previously connected to 
this transfer (on the W relay) are brought down to points 
rl, P5 and P7, P5 coming through the third wiper. The free 
ends of the W transfer, that are now disconnected from the 
circuit being tested are brought down via wipers 2 and 4. To 
test whether either part of this transfer can be shorted, the 
selector switch S17 goes through a complete cycle, putting 
the relays W, X, Y and Z in each possible state as in prev- 
ious modes of operation. In each state, the first test is 
to short P3 to P5, which in effect shorts the nodes of the 
circuit normally connected to the W part of the contact, and 
the circuit state is compared with the desired specification 
on the three-position switch, A disagreement operates relay 
A which, by way of wiper 1, fires the lamp corresponding to 
the W contact. This shorting of the nodes occurs in the buz- 
zer cycle during the period when the relay U is operated. 
The A contact is connected to the corresponding lamp through 
contact V and P' in series. This gives relay A time to oper- 
ate (or release from a previous operation) before its reading 
is applied to the lamp, and also disconnects the lamp before 
the state of A is changed by the next operation. 

The second test in the same buzzing cycle is to 
short the break contact of the transfer. This occurs when U 
releases, connecting P3 to P4 and P5 to P7. The W make is 
then connected as usual in the circuit being tested (via the 

Hx make, U» and wiper 2) and the nodes previously connected 
to the back W» contact are shorted via the 3rd wiper of sel- 
ector switch S16. In this part of the buzzing cycle the dis- 
agreement relay is connected via P and V» contacts (for timing 
margins similar to P» and V before) and the 5th wiper, to the 
lamp corresponding to the W' or break contact. This lamp 
will fire, as before, if a disagreement occurs indicating that 
the contact is necessary. 

After selector switch S17 has run through all states 
{ rows 1 to 16) it applies ground through wiper 2 to the magnet 
of selector switch S16, advancing it one step. The machine 
now applies the shorting test to the X and X» contacts connected 
to the second row of selector switch S16. Proceeding in this 
manner it tests all the contacts. On reaching row 13, the 6th 
wiper of selector S16 applies ground to its own coil through 
its own back contact. This causes it to step rapidly through 
the remaining positions until it reaches row 21 where it oper- 
ates relay T. The first selector switch is meanwhile still 
Deing pulsed by the buzzer circuit. After T operates, the 
first time S17 reaches row 22, relay S operates and the buz- 
zer stops. This completes the test. 

i <- ? f is des i r ed to hurry the machine through the 

latter part of a test (for example if only a few of the avail- 
able contacts are being used and these are near the top) the 
reset button S21 can be pressed. This causes S16 to run 
rapidly to the stop position (row 21). 

7. The Open Test 

The test for opening contacts proceeds exactly as 
the short test just described, except that having switch SIS 
in the open position opens wiper 3 of S16. This opens the 
short that was applied in the previous test to the nodes 
normally connected to the contact being tested. The relay 
therefore indicates the behavior of the circuits when the 
different contacts are opened. 

The "Prove" Mode of Operation 

When switch SIS is set in the "prove" position 
the machine indicates, by lighting some of the lamps L40 to 

that certain contacts are necessary in any circuit which 
realizes the switching function set up on the sixteen three- 
position switches. This indication is obtained by moving 
switch S22 through its four possible positions. In the W 
position the machine tests whether W and/or W contacts are 
necessary and if so, lights the corresponding lamps etc. 

- 10 - 

The method of operation is based on the following 
result in switching theory (stated for simplicity for the case 
of four variables). At least one W (make) contact is necess- 
ary in any realization of a given switching function if there 
are one or more states of the other relays (X, Y, and Z) such 
that when the X, Y and Z relays are in such a state, changing 
the W relay from unoperated to operated changes the function 
from open to closed. At least one W (break) contact is nec- 
essary if there exists a state of the X, Y and Z relays such 
that when they are in this state, operating the W relay changes 
the circuit from closed to open. These are both obvious, since 
the only way by which operating the W relay alone could close a 
previously open circuit is by establishing an operating path 
through a make contact on the W relay, and similarly for the 
condition with a break contact. 

The condition that a W contact is necessary can 
also be thought of geometrically in the following way. The 
sixteen states of the four relays can be thought of as the 
vertices of a four-dimensional cube. This cube consists of 
two three-dimensional subcubes, the first being the eight 
states of the X, Y, Z relays with W not operated, and the 
second, the eight states of the X, Y, Z relays with W opera- 
ted. If there is any point in the "W unoperated" cube in 
which the circuit is open (closed) while being closed (open) 
in the corresponding point of the "W operated" cube, at least 
one W (W ) contact is necessary. 

The "Prove" part of the circuit can best be under- 
stood in terms of this geometrical picture. A two-terminal 
network with terminals a and b is set up in the machine, 
corresponding to this cubeo Every vertex of the cube for which 
the circuit should be closed is connected to terminal a; all 
vertices for which the circuit should be open are connected 
to terminal b ("don't care" vertices are left floating). When 
testing for the necessity of W or W contacts, eight diodes 
are connected between corresponding points of the three- 
dimensional subcubes mentioned above. These point from the 
"W unoperated" subcube to the "W operated" subcube. Current 
will pass from terminal a to terminal b if and only if a W 
contact is necessary. This is true since this conduction 
can take place only by entering the cube at a closed state 
(these being the only ones connected to terminal a), passing 
through a diode in the conducting direction (this requires 
that the closed state be in the "W unoperated" cube) and leav- 
ing the cube to terminal b at an open state. Thus the con- 
ditions for conduction from a to b are identical with the con- 
ditions for necessity of a W contact. In a similar manner, it 
may be seen that the network will conduct from b to a if and 
only if a W contact is necessary. 

- 11 - 

In operation, the circuit is alternately tested for 
conduction in the two directions. The alternation is obtained 
by operation of the four-stage buzzer previously described. 
When P is operated, the circuit is tested for conduction from 
A to B. If this condition occurs, it fires the corresponding 
neon lamp (for the w, X, Y or Z make contact). When P is re- 
leased, voltage is applied to the AB network in the reverse 
direction and if conduction occurs, it fires the correspond- 
ing neon lamp (for the WV, X', Y» or Z» break contact). These 
lamps remain fired until released either by turning off the 
mam power or flipping the "evaluate-compare" switch S19 from 
one position to the other. 

Although it has been explained that the circuit for 
doing these tests is laid out in the shape of a four-dimensional 
cube, the circuit diagram of Figure 3 is not drawn by the use 
of a direct projection of such a cube, but is laid out in a 
Plane by a method due to W. Keister (The Design of Switching 
Circuits, D. Van Nostrand, 1951, p. 174), which simplifies its 

It can easilv be verified that by putting switch 
bd2 in any one of its four positions the circuit in Figure 3 
reduces to a 4-dimensional cube with 8 diodes joining its two 
halves. However the manner in which these 4 sets of & diodes 
each were combined to give a total of only 14, while at the 
same time using only 8 decks of the switch S22, may be of in- 
terest. It can be applied to give similar economies in the 
design of analogous circuits for cubes of any dimension. This 
method depends on some concepts due to R. W. Hamming (Bell 
System Technical Journal, 2£, pp. 147-160, April, 1950). It 
is possible to divide the vertices of an n-cube into two mu- 
tually exclusive and collectively exhaustive classes, called 
parity classes, depending on whether the number of coordinates 
having the value 1 is even or odd. If a point belongs to one 
parity class, all of the points which have distance 1 from it 
(and hence differ in only one coordinate from it) are in the 
opposite parity class. .This means that every edge of the cube 
connects vertices of opposite parity classes. Since in every 
position of S22 the diodes are connected along edges of the 
cube, it means that it is necessary to be able to connect 
diodes only between points of opposite parity classes. 

Thus the diodes are all connected to the points of 
one parity class, and the decks of switch S22 are connected to 
the points of the other class. If one diode pointing toward 
and one pointing away from each point of the even parity class 
is provided, then the switch contacts can connect each point of 
the other parity class to the other end of the proper one of 
these two diodes. In the actual circuit not quite this many 
diodes are used, since the points 0000 and 1111 require only 
one of the two diodes. 

- 12 - 

9. Notes and Comments 

The small size and portability of this machine depend 
on the fact that a mixture of relay and electronic circuit ele- 
ments were used. The gas diodes are particularly suited for use 
where a small memory element having an associated visual display 
is required, and the relays and selector switches are particu- 
larly suited for use where the ability to sequence and inter- 
connect using only a small weight and space is required. In 
all, the relay circuit analyzer uses only 24 relays, 2 selector 
switches, 48 miniature gas diodes, and 14 germanium diodes as 
its logical elements. 

It may be of interest to those familiar with gen- 
eral purpose digital computers to compare this method of solu- 
tion of this problem on such a small, special-purpose machine 
with the more conventional method of coding it for solution on 
a high-speed general-purpose computer. One basic way in which 
the two methods differ is in the directness with which the cir- 
cuits being analyzed are represented. On a general-purpose 
computer it would be necessary to have a symbolic description 
of the circuit, probably in the form of a numerical code des- 
cribing the interconnections of the circuit diagram, and repre- 
senting the types of contacts that occur in the various parts 
of the circuit by means of a list of numbers in successive 
memory locations of the computer. On the other hand, the relay 
circuit analyzer represents the circuit in a more direct and 
natural manner, by actually having a copy of it plugged up on 
the front panel. 

This difference in the directness of representation 
has two effects. First, it would be somewhat harder to use 
the general-purpose computer, because the steps of translating 
the circuit diagram into the coded description and of typing 
it onto the input medium of the computer would be more compli- 
cated and lengthy than the step of plugging up a circuit dir- 
ectly. The second effect is in the relative number of logical 
operations (and hence, indirectly, the time) required by the 
two kinds of machines. To carry out the fundamental step in 
this procedure of determining whether the given circuit (or 
some modification of it obtained by opening or shorting a 
contact) is open or closed for some particular state of the 
relays requires only a single relay operate time for the re- 
lay circuit analyzer. However, the carrying out of this fun- 
damental step on a general-purpose digital computer would re- 
quire going through several kinds of subroutines many times. 
There would be several ways of coding the problem, but in a 
typical one of them the computer would first go through a 
subroutine to determine whether a given contact were open or 
closed, repeating this once for each contact in the circuit, 

- 13 - 

and then would go through another subroutine once for each 
node of the network. Altogether this would probably involve 
the execution of several hundred orders on the computer, al- 
though by sufficiently ingenious coding this might be cut down 
to perhaps 100. Since each order of a computer takes perhaps 
100 times the duration of a single logical operation (i.e., a 
pulse time, if the computer is clock-driven), it turns out that 
what takes 1 operation time on one machine takes perhaps 10.000 
on another. 

Since 10,000 is approximately the ratio between 
the speed of a relay and of a vacuum tube in performing logical 
operations, this gain of about 10,000 from the directness of 
the representation permits this relay machine to be as fast as 
a general-purpose electronic computer. 

This great disparity between the speeds of a general- 
purpose and of a special-purpose computer is not typical of 
all kinds of problems, since a typical problem in numerical 
analysis might only permit of a speed-up by a factor of 10 
on a special-purpose machine (since multiplications and div- 
isions required in the problem use up perhaps a tenth of the 
time of the problem) . However, it seems to be typical of 
combinatorial problems that a tremendous gain in speed is 
possible by the use of special rather than general-purpose 
digital computers. This means that the general -purpose mach- 
ines are not really general in purpose, but are specialized 
in such a direction as to favor problems in analysis. It is 
certainly true that the so-called general purpose machines 
are logically capable of solving such combinatorial problems, 
but their efficiency in such use is definitely very low. The 
problems involved in the design of a general -purpose machine 
suitable for a wide variety of combinatorial problems seem to 
be quite difficult, although certainly of great theoretical 
intere st • 

10. Conclusion 

An interesting feature of the relay circuit analy- 
zer is its ability to deal directly with logical circuits in 
terms of 3-valued logic. There would be considerable interest 
in techniques permitting easy manipulation on paper with such 
a logic, because of its direct application to the design of 
economical switching circuits. Even though such techniques 
have not yet been developed, machines such as this can be of 
value in connection with 3-valued problems. 

- 14 - 

Whether or not this particular kind of machine 
ever proves to be useful in the design of practical relay 
circuits, the possibility of making machines which can assist 
in logical design procedures promises to be of value to 
everyone associated with the design of switching circuits. 
Just as the slide rule and present-day types of digital com- 
puters can help perform part of the routine work associated 
with the design of linear electrical networks, machines such 
as this may someday lighten much of the routine work assoc- 
iated with the design of logical circuits. 

Attached : 

Photograph No. 196492 
Figures 1, 2 and 3 




E — — W\ 1 a I + ?*v 









4^H>}— O— >M/<--" 
— i>tW-.i 




















-r 2 ^ 






P y' 

-X— H 

. — i — w — J— § ] — x — 1 










Oil I 

II 01 







3,8. 0»»<- 




59 0~ 

-o — K3 

relay coil 
front contact 
bach contact 

selector switch 



w 1 — * 4 — <T 

OCOMPAHE a ^ t2*V 

+39 V 







FIG. 2 
/l*»//V CIRCUIT 

Bvu- TmrHOMf 


P- *; <i'c 


N3cJO O 
J.&OHS ( 





— AV- 

* 5 

nO00'Zl\ ^ 



issue/ j • ic -S3 


FIG. 3 

pRovr circuit or 


moiNin of M«»ur«ciu»< 



Laboratories. Inc 

B- 349292 

p_ /HOLaxL 














The central part of the Throbac circuit is a relay accum- 
ulator which can count up to eighty in a modified Roman numeral 
system* The accumulator is arranged so that it io possible to add 
or subtract I, V, X or L to the contents of the accumulator. It 
consists of seven stages of U-2 circuits. The first three stages 
Wl-Zl, £2-22 and i'<4-Z4 accumulate ,f I*s n . These stages are arranged 
to count up to four arid recycle to aero at the fifth I. Thus, 
within these stages either sero, one, two, three or four "1*8" will 
be registered. The number of n I*s H appears in binary' form in the 
three stages of »-Z. 

The next \h-Z coribination accumulates "V's", either 

aero or one V being registered here* The final three stages 
VX^-Zi^, WIg-Zig and U*^-ZX^ accumulate IT s: , s n from aero up to seven. 

If the relay F is operated, the accumulator is arranged 
to add; if F is released, to subtract. Supposing F operated, 
closing Pj adds I to the contents of the accumulator. Closing 
P v adds V, P 1 adds X and P L add* U This may be verified by trac- 
ing out the circuit paths into the w-2 circuits in the various 
cases. For example, if the accumulator has aero in it, all W»s and 
2*8 are released, and when Pj is closed a ground passes through a 
chain of contacts Pj-F-Z^-F to pulse the WX-Z1 pair, and this Is 
the only W-Z pair to receive a ground. If, instead, P L had been 
pulsed, the fcfcj-ZJ^ pair and the SS^-ZX^ pair would both receive 
ground, thus registering L (Sill ♦ X), A study of the circuit 

• 2 *» 

vd.ll show that In all cases it adds or subtracts (according to 
the state of F) I, V, X or L when Pj, P^ g P x or P^ is operated. 

At the bottom of this circuit a connection leads out to 
control the C relay* This connection will be seen to carry & 
{.-round when a number is added to the accumulator vrhich causes it 
to overrun its limit either by addition, giving a number greater 
than seventy-nine, or, by subtraction, a number less than zero. 
In these cases the carrying to or borrowing from f&utt would be 
the next column goes out on the lead in question to control the 
relay. This relay, to be described later, indicates the end 
of a division. 

The number registered in the accumulator is displayed 
on the panel by means of a series of thirteen lights. These 
lights are controlled by contact networks on the W-Z relays of 
the accumulator. The contact networks translate from the modified 
Roman numeral notation to the standard one. The part of the number 
which is a multiple of ten appears in the three left columns of 
lights* 1*7 or X7, or X 6 , L $ or X $ . The part of the number 
registered which is less than ten appears in the four right 
columns of lights. 

As an example, suppose the number registered is LXXV 
(64) • In the accumulator the W-Z pairs W4-Z4 (HID, UX^-ZX^ and 
WX2-ZX2 (XXXXXX) will be operated and other W-Z pairs released. 
In the accumulator light circuit it will be found that lights 
L©, I4 and will receive a ground and be illuminated, dis- 
playing the number IXIV, 

The sequencing for adding or subtracting a number entered 
in the keyboard into the accumulator is carried out chiefly by 
stepping switch A, For such an addition or subtraction, this 
stepper sweeps across the keyboard, starting from the right-hand 
column and sequentially adding or subtracting the numbers registered 
ftn each column. The addition sequence is started by pressing the 
ADD button which causes P to operate and lock in through a back 
contact on £• The operation of P causes the bus a or relay 8 to start 
operating and releasing at about ten cycles per second. Whan 3 
closes it pulses the stepping coil of stepper A, novin^ it ahead one 
notch. The release of D puts a ground on the wipers of the stepper 
and, therefore, on the first vertical connection through the key- 
board switches* Let us suppose that the number -IX VI is entered in 
the keyboard In the four right-hand columns* I is then registered 
in the right most colum and the ground from the stepper passes 
through this I push button to operate the Pj relay* The F relay 
has been operated by P and therefore X is added to the previous 
contents of the accumulator* On the next cycle of the busser. the 
stepper moves to the next column and operates the Py relay which 
adds V into the accumulator* Py also causes E to operate and 
lock in through t% The purpose of this is to cause any further I*s 
to be subtracted rather than added. On the next cycle of the 
buzser, ground is applied to the third vertical of the keyboard 
and, because of the t entered there, operates the P L relay* This 
adds L to the accumulator and also operates the S relay, which also 

locks in through The operation of 5 signif ies that an L has 
occurred and consequently any X'b or V«s now encountered on the 
keyboard oust be subtracted. On the next cycle of the buzzer, the 
fourth vertical receives ground and because of the X in this column, 
p i operates. Since S is closed, the relay H also operates, releas- 
ing F and isaking the accumulator subtract instead of add. The 
tiding of these relays is adjusted so that F releases before the 
p£ pulse could add into the accumulator. X is therefore subtracted. 
On the next three cycles of the buzzer, no further numbers are en- 
countered and the accumulator does not change. On the eii^ta 
cycle, the wipers pass a ground to the K relay which locks in 
axsmentarily, and also to the reset coil of the stepper. The opera- 
tion of K releases relays P, & and s and also disconnects the buzzer 
and the wipers. The reset coil allows the wipers to return to their 
normal position and since they have been disconnected by K they have 
no effect as they pass over the keyboard colunns. When the wipers 
reach their nornal position they open the off-normal switch of the 
stepper. This releases K and the addition operation is complete. 

The process of subtraction is essentially the sane. 
Pressing the subtract button causes M to operate and lock up, which 
starts the buzzer and the stepping operations. In this case, 
however, F Is normally released, so that numbers encountered in 
the keyboard are normally subtracted* However, when a smaller 
number is encountered after a larger one the relay F will operate, 
causing It to be added. 

Sfciltiplication Is obtained by successive addition. If 
the m button is pressed, the machine adds the contents of the key* 
board into the accumulator V tines, if the M button is pressed 
X tines. This counting is controlled by stepper B. If the m 
button is pressed, the keyboard contents ere added or subtracted 
depending on whether the Wt or buttons have been previously 

Suppose VIII is to be multiplied by IV. VIII is entered 
in the keyboard and first the MV and then the 11. push buttons 
pressed. When the m button is pressed, relay ffl operates and 
locks in through Qt. The relay T also operates, locking in through 
the Clear Upper key. The relay T signifies that I's occurring later 
in the multiplier must be interpreted as negative. The operation 
of KV causes the P relay to operate and start an addition operation* 
When stepper A reaches the eighth point, K operates causing the step* 
ping coil of stepper B to receive a ground {through the MV make) . 
fcfoen stepper A resets to normal, P again operates, again adding the 
keyboard contents into the accumulator and advancing stepper B at 
the end of the addition* This process continues until stepper B 
reaches Its fifth point* There the ground on the wipers operates 
relay Q which releases MV and stops the series of additions. 
Q locks in and applies ground to the reset coil of stepper B, return, 
ing it to normal* When it reaches normal, the off -normal contacts 
are opened and Q is released. 

Next the ia button is pressed* Since T is in (due to 
the previous operation of 117) , this causes H to operate and the 
machine subtracts the keyboard contents from the accumulator. This 
c ample tee the multiplication. The ML button produces a sequence 
s im i l a r to the MV button , except that stepper £ crust go to the tenth 
point instead of the fifth to operate Q and stop the series of 

If another multiplication is to be performed, the Clear 
Upper button should be pressed. This releases T and resets stepper 
B to normal if for some reason it is not already there. 

Division is performed by successive subtraction. The 
dividend is entered in the accumulator and the divisor in the key- 
board. When the divide button is pressed, relay E operates and 
locks in through P* or K*. C is normally out and E, therefore, 
causes M to operate and lock in, starting a subtraction. If, during 
this subtraction, the accural la tor does not run through aero, C will 
not operate and another subtraction will occur since U will again 
operate as soon as £ releases. At each subtraction of this sort 
the operation of & at the end of the subtraction energises the 
stepping coil of stepper B advancing it one step. Eventually in 
this subtraction process the contents of the accumulator will go 
negative. This causes C to operate and indicates that one too 
many subtractions have been performed. The last subtraction is 
not counted on stepper B since its operating path passes through C«. 
The operation of C causes the next operation to be an addition, since 
the next ground when S releases is placed on P rather than M. The 

machine therefore goes through one addition sequence (compensating 
in the accumulator for the extra subtraction)* At the eighth point 
of this sequence K operates and, since P is operated, the hold on 
£ opens and E releases. This stops any further additions or sub- 
tractions and also releases the C relay for the next division. 
The stepper B will be at a level equal to the number of subtractions 
{not counting the extra one) and Its position therefore is the 
quotient desired. The value of this quotient is indicated on the 
quotient lights which are wired to the contacts of the stepper in 
such a way as to indicate in Soman numerals the position of the 
wipers. This dial is cleared by pressing the Clear Upper button 
whltaj operates the reset coil of stepper B. 

c. e. suaekoh 

April 9* 1953 



C. E. Shannon 

The Tower of Hanoi machine automatically solves a well-known puzzle 
constructed as follows. There are three pegs standing upright in a horizontal plate. 
On the first peg are a number of disks of graduated sizes. The problem is to move all 
these disks to the third peg subject to the rules that (1) only one disk can be moved at 
a time, and (2) a disk can never be placed on top of a smaller disk. 

This puzzle has been treated in the literature. It can be readily proved by 
induction that with n disks, 2"-l moves are necessary. For suppose this formula is 
true up to n-\. With n disks, in order to move the largest one to the third peg it is 
necessary that all the other disks be on the second peg in proper order. This, by 
assumption, requires 2 n_1 -l moves. Moving the largest disk requires one more and 
moving the n-l disks from the second to the third peg, again by the inductive 
hypothesis, requires 2 n_1 -l moves. Consequently the entire operation requires 2"-l 
moves. Since the formula is true for n = 1, it holds in general. The argument also 
shows how to build up a solution for any n from n-l, and hence, eventually, from the 
n = 1 case. 

For n = 6 (the case handled by the machine) the solution is given by the following 

































































































ni 1 AAA 


/"\ 4 H AAA 
































The first column gives the binary numbers from to 63. The second column 
describes the positions of the disks. For example, 000000 means that all disks are on 
peg 0. The fifth entry 000122 means that the three largest disks are on peg 0, the next 
smaller disk on peg 1, and the two smallest disks on peg 2. The numbers in the 
second column are related in a peculiar manner to the binary numbers in the first 
column and can be calculated from them. The process can best be described by an 

example. Take, for instance, the binary number 010110. The following calculation i 

- 3 - 


10 110 
2 2 1 2 2 
1 2 2 

The columns here alternate + and -. The second row 022122 is obtained by summing 
the first row horizontally mod 3 with + or - sign depending on the column. Thus 0=0, 
2=0-1, 2=0-140, 1=0-1+0-1, 2=0-140-1+1 and 2=0-1+0-1+1-0 (all mod 3). The 
third row is obtained from the second by alternately adding and subtracting the first 
row from it. This row is the corresponding position of the disks in the solution of the 
puzzle. It can be shown that this relation holds in general. 

The Tower of Hanoi relay circuit is based on this curious relation. The machine 
basically consists of a binary counter (six stages of W-Z counters) which counts from 
to 63. Contacts on these relays are connected in a network which controls a set of 
eighteen lights. There are three lights for each of the six disks, one on each of the 
three pegs. At a given time, one of these three will be on, indicating the position of 
the corresponding disk. As the counter proceeds through its count, the lights are 
switched to indicate the process of the solution. 

The circuit of the machine is shown in Fig. 1. The right hand network controls the 
lights. It will be seen that this consists of a symmetric function lattice in which the 
stages alternately add and subtract mod 3. The ground coming in at the bottom of this 
circuit will appear in columns 0', 1', 2' according to the first number computed in the 
above calculation (i.e. 0'2'2'1'2'2' in the example given). The further calculation 
(012002 in the example) is carried out by the single stage mod 3 circuits attached to 

the basic mod 3 lattice. 

It is interesting in this circuit that when one of the larger disks is moved the lamps 
corresponding to smaller disks receive their operating current through a path which is 
switched. The counting process, however, is so rapid that they appear to be 
continuously illuminated. 

The control circuit at the left of the figure contains a three-position key switch. In 
the center position, the machine stops. In the top position, it causes the buzzer B to 
operate the counter and therefore proceed through the solution at about two steps per 
second. When the count reaches sixty-three, the buzzer stops. If the key switch is 
depressed to the lower position (non-locking), the counter is advanced one count. By 
moving the switch between the center and the lower positions the solution can be 
observed step by step. 

Matbmanship or How to Give an Explicit Solution Without Actually 

Solving the Problem 

After reading several weighty papers giving formulas 
which assume only prime values, I felt moved to develop a few 
further results of the same type. 

Theorem 1* There exists a unique real positive number X < 1 
such that 

e^ - £2° X] - 2[2 n - 1 XI 

!0 if n is composite 
1 if n is prime 

Here Lx] means, as usual, the largest integer in x. 
The value of X Is ,413 •••• 
Theorem 2. There exists a unique real positive number \i < f 
such that the n*" prime is given by 

- IS?* 1 u] - 2 2 * 1 L2^ u] 

Hots the i mp r o v e ment over previous results - this 
formula gives all the primes, not Just some of them* 
For analysts who find the bracket symbol a little 
suspect, we have the following: 
Theorem 3* There exists a real number h such that sin 2 n q is 

positive or negative according as n Is prime or com* 

» 2 a 

Theorem 4* There exists a real number & such that 

- tan 2^ 5| <^ 

Proofs are left as an exercise for the reader. 



Bell Telephone Laboratories / ^ \ 

incorporated " 

Cover Sheet for technical memorandum 

subject: The Relay Circuit Synthesizer - Case 20878 


case file ( HWB-WOB-JBF) ( BDH) 



MM _ 53-140-52 
DATE November 30, 1953 
author C. E. Shannon 

E. F. Moore 




L. Almquist 



W. Bode 









J. Busch 





B. Clark 



H. Doherty 




B. Ferrell 

Switching Theory 



B. Fisk 



T. Friis 

18- C. A. Lovell 



C. Fry 

19 - M. B. McDavitt 



W. Gilman 

20 - J. Meszar 



W. Hagelbarger rr 
D. Holbrook >v v 

21- R. K. Potter 



22 - F. J. Singer 



C. Keller 

v />s.23-S. H. Washburn 



A. Korn 

V^f^L- I. G. Wilson 



D. Lewis 


The Relay Circuit Synthesizer is a machine to aid 
in switching circuit design. It is capable of designing two 
terminal circuits involving up to four relays in a few minutes. 
The solutions are usually minimal. The machine, its operation, 
characteristics and circuits are described. 

The Relay Circuit Synthesizer - Case 20878 

MM- 53 -140-52 
MM- 53-180- 52 

November 30, 1953 


Purpose and Operation 

The Relay Circuit Synthesizer (Photograph 214142) 
is a machine to aid in the design of a certain class of relay 
circuits. The type of circuits it handles are two-terminal 
switching circuits involving up to four relays or (by simple 
alterations) other two-valued elements. The desired charac- 
teristics of the circuit to be designed are entered in a set 
of sixteen three-position switches on the front panel of the 
machine. After a period of computation, averaging about five 
minutes, the machine stops and displays a circuit satisfying 
the requirements. The circuit is displayed in geometric form 
on a card in an associated card display mechanism (Photograph 
214140). The labels of the contacts on this card must, however, 
be interpreted in accordance with indicating lights on the 
front panel of the machine to obtain the proper answer to the 
design problem. 

In about eighty per cent of the possible problems 
that can be set up on the machine, the solution it gives will 
be minimal in contacts, i.e., the number of contacts in the 
circuit cannot be reduced. In the remaining twenty per cent, 
the designs cannot be simplified by more than one contact and 
may, in fact, be minimal. 

The sixteen input switches correspond to the six- 
teen possible states of the four relays in the circuit being 
designed. Each of these switches has three positions labeled 
"open," "don T t care" and "closed". If, for a given state of 
these relays, it is desired that the circuit be open, the 
corresponding switch is set in the "open" position. Similarly 
for the "closed" position. If it does not matter whether the 
circuit be open or closed in this state, the switch is set at 
"don't care"# The Synthesizer takes advantage of any switches 
in the "don't care" position in attempting to reduce the 
number of contacts used in the final circuit. It fills in 
these unspecified states in such a way as to minimize contact 
requirements. This ability to handle partially specified 
switching problems is one of the main features of the Synthesi- 
zer and enables it to solve problems for which analytic methods 
are at present ill-adapted. 

- 2 = 

In addition to the direct circuit designing pro- 
cedure outlined above, the Synthesizer is equipped with 
controls for other modes of operation. It may be run at 
low speed for demonstration purposes, it may be set up to 
find all the circuits in its card file satisfying the re- 
quirements (not just the one with the smallest number of 
contacts) and it may be used to determine various mathematical 
properties associated with switching functions* 

By changing the paper tape and the card file used 
(but without any internal change within the electrical part 
of the machine) it can be made to solve design problems in- 
volving diode circuits instead of relay contact circuits. 
By a still different tape and set of cards it can minimize 
the number of transfers in r elay circuits instead of the 
number of contacts. With suitable tape and card file, it can 
solve a variety of other similar problems. 

The Synthesizer represents a first step toward 
machine design of switching circuits. Unfortunately, although 
the method used in the Synthesizer may be generalized in prin- 
ciple to circuits involving five or more variables, the time 
for solution increases at an alarming rate. With five vari- 
ables it would take many thousand times as long to obtain a 
solution. The card file and the tape would' be about two thou- 
sand times their present size and would require many man years 
to construct. Consequently, a direct generalization of the 
Synthesizer is hardly indicated, even with the high speeds 
available in electronic computing gear. 

Speed of Solution With Random Problems 

An idea of the time required for the Synthesizer 
to solve problems may be obtained from some tests with random 
settings of the input switches. Using a book of random num- 
bers, ten sets of sixteen random binary digits were obtained. 
These were set up as input switch settings using to mean 
closed and 1 open, and the time required for the machine to 
solve each of these problems was measured. The following table 
gives the results of this test. 


Binary Digits Solution Trans- 
( Switch Settings) Circuit No. formation 

11 #279 w* w 

x 1 z 

111 y y 

110 z* x 

10 1 #177 w» x 

10 x y 

10 y z 

1111 z w 

10 10 #306 w z 

1 x* y 

10 1 y» w 

1 z» x 

1 #261 w z 

1 x» w 
10 1 y y 
1110 z» x 

10 10 #212 w x 

111 x* w 

10 1 y 1 y 

10 z z 

Ho. of Time of 
Contacts Solution 

8 4min-10sec c 

6 lmin-10sec« 

10 7min-20sec. 

10 7min-7sec. 

11 9min-6sec . 

Binary Digits 
(Switch Settings 

10 1 

10 11 


10 11 



10 11 

10 10 
10 10 

10 11 
10 11 

- 4 - 

Solution Trans- 
Circuit No. formation 

#137 w w 

x» x 

y y 

z z 

#75 w x 

X z 

y T y 

z w 

#240 w« y 

X I w 

y f x 

z z 

#193 w z 

x» y 

y w 

z x 

# 34 w x 

x» z 

y w 

z y 

No. of Time of 
Contacts Solution 

9 6min-32sec. 

9 6min-10sec. 

5 3Ssec. 

# 4min-30sec. 

9 5min-50sec. 

- 5 - 

The Solution Circuit Number refers to the Table 
in MM-52-180-45, E. F. Moore, n A Table of Four Relay Two Ter- 
minal Contact Networks". The Transformation indicates the 
required change of variables in interpreting the numbered 
circuit of this Table. The average solution time for these 
ten completely specified random functions was 5 min.-15 sec, 
and the average number of contacts in the solution was 8.5. 

A second test was run with partially specified 
random functions. Again using the Table of Random Numbers, 
four switches were chosen at random for "don T t care" settings; 
the remaining switches being given random "open" or "closed" 
settings. This was done four times, leading to the following 

Binary Digits 

(Switch Settings) Solution Trans- No. of Time of 

D "Don't Care Circuit No. formation Contacts Solution 

D 1 1 #334 ww 6 3min-5sec. 

D xx 

D 1 D y y 

z z 

D 1 D #189 w* w 7 6min-30sec. 

D 1 1 x z 

10 y y 

D 1 z x 

1 D #178 w y 8 7min-25sec. 

D 1 1 x' w 
D 1 y z 
D 1 1 z» x 

001 D #58 wy 3 12sec. 
D D 1 x w 

D 1 1 y» z 

10 11 z» x 

- 6 - 

The average time of solution for these problems with four un- 
specified states was 4 min.-20 sec, with an average of 6 

Finally, a test was run with random problems having 
eight unspecified ("don't care") states. These results were 
as follows: 

Binary Digits 
(Switch Settings) 
D=Don l t Care 

D 1 D 

D D D 

D 1 

D D 

Circuit No, 


Trans- No. of 
formation Contacts 








Time of 


D D 1 

1 D 1 D 
10 10 
D D D D 


w y 
x x 

y' z 

Z 1 Z 



D D 

D 1 

1 D D 1 
D D D 

# 5* 

w y 

x x 

y w 

z z 


D D D D 

1 1 D D 

D 1 1 

D 1 1 

# 79 

w* y 
x' z 
y x 
z w 


The average solution time here was 1 min.-Sft sec, and the 
average number of contacts 4»5, 

- 7 - 

The following table summarizes these average figures: 

Completely Unspecified Unspecified 
specified in 4 states in g states 

average time 5min-15sec 4min-20sec lmin-56sec 

average number £.5 6 4.5 

of contacts 

With still more "don't care" states the solution time and 
average number of contacts would undoubtedly decrease still 

General Theory of Operation 

The Relay Synthesizer deals with Boolean functions 
of four variables. Each of the variables has two possible 
values, to Ij in conjunction there are 24 = 16 sets of values 
or "states" of the variables. For each of these states, a 
function of these variables can be either to 1. Thus there 
are 2 16 = 65,536 different Boolean functions of four variables. 
It is known that these 65,536 functions can be subdivided into 
402 classes or "types" of functions. Two functions are said to 
be of the same type if one may be obtained from the other by 
negating some of the variables or permuting some of the vari- 
ables or both. Thus the function 

w + x»(y+z) 

is of the same type as 

x» + z(w+y*) 


w T + yfx'+z*). 

All functions of a given type present substantially the same 
design problem. If a good circuit is found for one of them, 
it applies equally to all other functions of the same type, 
for it is necessary only to relabel contacts properly and it 
will represent these other functions. 

- $ - 

In the memorandum referred to above, circuits are 

foT en *. f0r these 402 types of fun ctions. At present writing, 
331 of these have been proved to be minimal in contacts; the 
remaining 71 are known to be within one contact of being 
minimal. This catalog of circuits is a key part of the design 
procedure in the Relay Synthesizer. S 

The reader may wonder why the Synthesizer is ne- 
cessary for designing circuits when such a catalog is available. 
Why not merely find the circuit corresponding to the desired 
function in the catalog? The answer is that it is not at all 
easy to find the type or class to which a given function be- 
longs even when the function is completely specified. If the 
desired function is not completely specified : (has one or more 

don't -care" states) there will in general be many types of 
functions consistent with the requirements, and it becomes 
extremely difficult to locate these in the catalog. The* Syn- 
thesizer is, in fact, a machine for determining the type* of a 
fully specified function and (in the partially specified case) 
the possible type having the least number of contacts in its 
catalog circuit, 

A block diagram of the Synthesizer is shown in 
Figure 1, and indicates the main functional organization. The 
specifications of the desired circuit are set up on the input 
switches in the right-hand box. The catalog of the 402 types 
of functions appears on a paper tape in the left-hand Tape 
Input box. Each function occupies six lines of tape. The 
first four lines give the states for which the function is 
closed. The fifth line gives the number (in binary form) of 
closed states for the function, and the sixth line contains a 
special hole marking the end of data relating to this function, 
i.e., it acts as a punctuation mark separating functions on 
the tape. 

In solving a particular problem, the tape functions 
are studied one by one in the machine. All permutations and 
negations of a particular tape function are compared with the 
desired specifications as set up on the input switches, when 
an exact match is found the machine stops, and the tape func- 
tion together with the permutation being applied to it re- 
present a solution to the problem. 

In the block diagram this is carried out as follows: 
The tape function is stored in the memory relays. A permuting - 
negating network applies the equivalent of the various possible 
permutation and negation operations to these data. The results 
of each permutation-negation operation are compared with the 
input switches in a comparison circuit to see if a match has 

occurred. If not 5 an error signal is fed back to the permu- 
tation sequencer, causing it to advance to the next permutation 
operation which is, in turn, compared, etc., until all of the 
3#4 possible permutations and negations have been tested. Be- 
cause of short-cut circuits to be described later, the machine 
frequently skips many of these, reducing the solution time 

When the set of operations on a particular function 
is exhausted, the permutation sequencer sends a signal back 
to the tape driving circuit, and the next function is read 
into the memory for test. This signal also causes the card 
display device to drop another card from its stack. The card 
displayed always corresponds to the function being tested in 
the machine and shows the most efficient knovn circuit for 
that function. 

The permutation indicator is controlled by the per- 
mutation sequencer and indicates in lights the permutation 
currently being tested. When the machine stops at a solution, 
these lights show what permutation and negation must be ap- 
plied to the circuit on the card to solve the problem at hand. 

In the problems involving "don^ cares," the Syn- 
thesizer could be used to successively find all of the solution, 
but to use all this information in designing a circuit, it would 
be necessary to compare all the circuits obtained, and see which 
one is preferred. Since the grounds for preferring one circuit 
over another has been taken to be economy of contacts, the ne- 
cessity for this comparison step has been eliminated by arrang- 
ing the functions on the tape in order of increasing number of 
contacts, so that the first solution arrived at will automatic- 
ally be the preferred one. Arranging the functions on the tape 
in terms of any other criterion will cause the Synthesizer to 
design circuits based on this criterion. If, for instance, it 
is desired to design relay circuits using as few springs as 
possible, or to design diode logic circuits using as few diodes 
as possible, it is only necessary to arrange the functions on 
the tape in order of number of springs or number of diodes, 

Circuit Operation 

Figure 2 is the circuit diagram of the Synthesizer. 
The layout of subcircuits corresponds roughly to the block 
diagram Figure 1. We will first describe the circuit operation 
in the logically 3'implest mode of operation — the normal mode 
with all short-cut circuits eliminated. In Figure 2, then, 5 we 

- 10 = 

assume the mode of operation switch in the "Normal" position 
N, the relay Q operated (eliminates permutation short cuts) 
and the number of state switches M are set at "Normal",, 

Since the Synthesizer is essentially a closed loop 
system, it is difficult to find a point at which to start a 
description of its operation. It is perhaps simplest to as- 
sume that the machine has just finished testing one function 
on the tape. The relay H may then be assumed to have just 
operated locking in to the make on R . since the tape reader 

will be at the division line between functions and consequently 
R s operated,, Operation of H releases the hold on the memory 

relays (M^M-^ „ „ „ .M^) and also the hold on the steering 

counter relays (W^Z^W^Z^W^Z^) , thus resetting this 

counter to zero. It also applies voltage to the teletype 
magnet which, a moment later, will pull free of the tape and 
hence release R . This releases H and reconnects the holds 
of the steering counter and the memory relays. It also es- 
tablishes a path to the slow relay SO through its own back 
contact SO*. SO now acts like a slow buzzer, producing 
pulses at a rate of about six per second and relay U follows 
these pulses through the SO make contact. 

The pulses produced by U operate the teletype magnet, 
advancing it line by line until it reaches the line with an R 
hole, at which point the back contacts on R g open both the s 
buzzer circuit to SO and the teletype magnet circuit through U. 
The pulses produced by U are also fed into the three-stage 
binary counter consisting of three WZ pulse dividers WjqZ^, 

W M2 Z M2'» ^oho* Tnis countei *> therefore, keeps track of the 
line of tape, counting from the last division between two tape, 
functions iR s hole). This counter controls the steering trees 
leading into the memory relays Mq,]^, . . . ,1^ and the number of 
state relays V l5 V 2 ,V^,Vg The first line of tape after the R g 
line is fed into M^M^M^M^, the second line into M^,M 5 ,M 6 ,M 7 „ 
the third into M^M^M^M^, the fourth into M 12t M 13 » M 14 » M 1 c J 
and the fifth into ^^"^Vg. A section of the tape is 
shown in Figure 3. 

The completion of this tape reading operation, in- 
dicated by closure of R g , puts ground on lead 106 leading into 
the permutation-negation network. 

- 11 - 

Permuting and Negating Circuits 

These circuits enable the machine to apply the 
3#4 negation and permutation operations to the tape function 
stored in the memory to compare it with the desired function 
set on the input switches. 

The negation-permutation sequencer consists of nine 
WZ pairs connected in a form of counting circuit which can go 
through 3#4 different states. Starting from the- iigh .speed k 
(pulsed) end of this circuit, the first (6ix/WZ pairs, E, D, B, 
C and A, relate to permutations and can go through twenty-four 
states corresponding to the 41 = 24 permutations of the four 
variables. The other four stages w, x, y, z relate to negating 
the variables and can go through sixteen states corresponding 
to the sixteen ways of negating four variables. In combination 
this gives 3#4 states. 

In the circuit, imagine Q operated, F Q and F T £ re- 
leased and thatFo is pulsed, so that a series 6f pulsus is 
applied to line 109. The negation-permutation ^sequencer will 
then proceed through the 3^4 negation-permutation operations. 
This sequence is shown in the accompanying Table I for the 
first twenty-four of these, i.e., a full set of permutations. 
At the twenty-fourth step this sequence repeats for the permu- 
tation relays but a pulse is applied at lead 250, advancing 
the negating relays one step. The negating relays go through 
the sequence shown in Table II, advancing one step after the 

fermuting relays have gone through a full set of permutations, 
n this manner the full set of 16 x 24 combinations is ex- 

- 12 - 

Table I 
Sequence of Permutations 


W A W B W C W D W E 
(1 means operated) 

A B C D 


W X Y Z 










Y Z 









Z X 









Y X 









X Z 









X Y 









Z Y 









W Z 









W X 









W X 




1 1 





¥ Z 









¥ Y 









W Y 










Z W 









X w 









X ¥ 










L W 








Y ¥ 









Y ¥ 









Z Y 









X Z 









X Y 









Z X 









Y X 




1 1 




Y Z 

- 13 - 

Table II 
Sequence of Negations 

Relays Relays Variables 
W w W v Vf Vtf W I T Z W X Y Z 

w x 7 z ' Become 






















































XT y» 






























X 1 






































At the end of this sequence, a ground is applied 
to line 135 which initiates reading in a new function. 

It may also be noted that if relay Q is released 
and F16 is operated a ground is applied directly to line 250, 
the input to the negating part of the counter. This will 

- 14 - 

cause the counter to skip a set of permutations and advance 
directly in the negating sequence by one step. Operation of 
Fig also releases the plus side of the permutation relays in 
the sequencer, resetting them to zero. The function of F|g 
is to short-cut some of the calculation in certain cases as 
will be described later. 

In a similar way, operation of F& with Q released 
advances the and Wg parts of the permutation sequence by 
one step, skipping a subset of six permutations in which 

Wq, W d and W E take part. F^ releases the plus to these three 
WZ pairs, resetting them to zero. This also is used for 
short :out_ purposes. 

The permuting and negating relays A, B, C, D, E and 
W, X, Y, Z are operated from back contacts of the correspond- 
ing W relays in the WZ pairs of the sequencer. Thus they as- 
sume the complementary states as shown in Tables I and II. 
The function of these nine sets of relays is to interchange 
sixteen leads representing the function in the memory relays 
in accordance with the permutation and negation in the se- 

The logical organization of this circuit can be 
represented in a symbolic form by Figure 4, which indicates 
the effect of the negating and permuting relays on the variables 
of the tape function, (not £he effect on the sixteen leads) . 
Thus, the W relay negates the variable W when released, the X 
relay negates X, etc. The A relay interchanges W and X and 
also Y and Z when released, the B relay interchanges the vari- 
ables now appearing (after the possible A interchange) on the 
first and third lines, etc. It will be found that the twenty- 
four combinations of A, B, C, D, and E produced by the sequencer 
(Table I) lead to the twenty-four permutations of the four 
variables as shown in Table I« 

Now the circuit does not work with the four Boolean 
variables but with sixteen lines representing the sixteen 
states of the four variables. Negating a variable, say W, 
corresponds to interchanging the eight lines (or states) for 
which W is 1 with the corresponding eight lines for which W 
is zero. Thus in the premuting circuit, the W negation box 
of Figure 4 becomes eight reversing or interchanging circuits 
operated by the relays W-^ W 2 , W 3 , W^. A similar statement 

applies to the negation of the other variables and the per- 
muting of the variables by the Ai} B , C, D and E relays. 

- 15 - 

To summarize, the sequencer can go through 3#4 
states representing the 3#4 permutations and negations. The 
negating-permuting network sets up the corresponding inter- 
changes of the sixteen lines from the memory to the input 
switches. At the memory end, these lines are given plus or 
minus voltage according as the memory function is open or 
closed. At the input switch end, after the permutation and 
negation, these voltages are compared with the settings of 
the input switches,, 

There are two types of comparison circuits. The 
first type, Figure 5, applies to switches Q, 7, S and 15. 
It will be seen that Ffo will operate if the lead from the per- 
muting network is positive and the switch is set at "closed," 
or if the lead is negative and the switch is set at "open," 
i.e., if there is a disagreement between the switch setting 
and the value coming in from the permuting network. If the 
switch is set at "don't care," Fk will not operate. It will 
also be seen that the red and green lights will indicate 
"closed" and "open" settings of the switch respectively, 
while if set at "don't care" the red or green light will in- 
dicate minus or plus coming in from the permuting network. 

The comparison circuit for the other switches is 
somewhat different. There are two relays F-^ and F 2 common 

to all the other switches. If a particular switch is set at 
"closed," the line from the permuter goes through a diode 
to F 1 , the other side of F 1 being minus (when the test is 

made). Thus F 1 will operate if a plus appears on the line 
from the permuter (disagreeing with the "closed" position 
of the switch). If the switch is set at "open," the path 
from the permuter goes through the same diode but in the op- 
posite direction to F 2 , whose other side is connected to 
plus. Hence F 2 will operate if a minus comes in from the 

permuter. The red and green lamps are connected substantially 
as before. 

Returning now to the description of the operating 
sequences in the machine, we recall that the completion of 
tape reading of a function into the memory was signified by 
closure of R . This applies ground at lead 106 into a long 
"equality chain" of contacts. This chain is closed only if 
all of the W relays in the WZ pairs of the sequencer agree 
in position with their corresponding Z relays. This being 
true, ground is applied to the permuting and negating net- 
work, and, as already described, one or more of the F relays 
(F Q , F^, Fg, F^, F^, F 2 ) will operate unless the tape func- 
tion as permuted through the network agrees with the input 

- 16 - 

function. Assuming there is a disagreement, one at least 
of F^, F^, F^£ will operate, grounding the input to the ne- 
gation-permutation sequencer. This advances the W relays 
of the sequencer one step in the sequence, and causes a dis- 
agreement between at least one of the W relays and its 
jSorresponding Z relay in the WZ pairs. This disagreement, 
in turn, opens the "equality chain," releasing the F relays 
which, in turn, removes the ground from the sequencer and 
allows its Z relays to follow their corresponding ¥ relays. 
When equality has again been established, ground is again 
applied through the "equality chain" to the permuting network 
and the next permutation of the sequence (now set up on the 
permuting network) is tested in the same way. This cycle of 
operations continues until the full set of permutations and 
negations has been tested. After the last permutation, the 
next ground goes through a Z w contact and the mode of opera- 
tion switch to operate H, signifying the completion of tests 
on the current function and initiating reading the tape for 
the next function as previously described. 

If, at some point, the permuted tape function 
matches the input function, no F relay will operate and the 
cycle is stopped. Relay J will operate and, in turn, L 
through the chain of back contacts on the F relays. The 
operation of L rings the gong indicating a solution, and 
pulses the message register for counting purposes. 

Short-Cut Operation 

We now describe the short-cut provisions. If the 
short-cut eliminator is "off," relay Q will release, rear- 
ranging the inputs to the sequencer. In the permuting net- 
work it will be seen that the lines on the zero level and 
on the 15 level are not switched after the vertical column 
of Z contacts, i.e., after emerging from the negating part of 
this circuit. This means that if a disagreement occurs on 
either of these lines, it will persist throughout all the 
permutations, which only change the switches A, B, C, D and 
E in this network. Hence, in case of such a disagreement it 
is not necessary to test all of these permutations but the 
machine can proceed immediately to the next negation saving 
a great deal of time. 

In the circuit, when Q is released, operation of 
Fv or F,~ pulses directly into the negating part of the 
sequencer and resets the permuting part to zero. 

- 17 - 

In a similar manner, it will be seen that the lines 
at the 7 and & level in the permuter are not switched after the 
B contacts. This means that a disagreement on either of these 
lines, indicated by operation of Fy or F#, will persist over 
the subset of six permutations in which C, D and E change* 
Hence it is unnecessary in such a case to test each of these 
individually and the machine advances to the next permutation 
involving a change of A or B. In the sequencer, a ground is 
applied at the input to the A, B stages and G, D, E stages 
are reset to zero. This is done by relay Fq which will pperate 
if either Fy or Fg indicates disagreement. 

One further short-catting device has been incorpor- 
ated in the machine. With each tape function is included, in 
binary form, the number of states for which that function is 
closed. As previously described, this number is stored in the 
relays V lf V 2 , V^, Vg, V lo when the function is read off the 

tape. On the front panel of the machine are two seventeen- 
point switches labeled Max and Min. The Min switch should be 
set at a number equal to the number of input switches in the 
"closed" position. The Max switch should be set at this 
number plus the number of "don't cares". Now, regardless of 
how the "don't cares" may be filled in, the number of closed 
states will be within this range (including the end points). 
A function from the tape could not possibly be satisfactory 
unless its number of states lies within this range. The 
machine is arranged to compare these numbers and, if this con- 
dition is not satisfied, to skip the function completely and 
go immediately to the next function on the tape. 

The comparison is carried out in the "number of 
states comparison circuit". The contacts on the V relays are 
arranged in the topological dual of an ordinary tree. This 
implies that if the number n is registered (in binary form) 
in the V relays, then all of the vertical leads labeled zero 
to n at the Min switch will be connected together, but the 
two groups are not connected. It will be seen, therefore, 
that if the number on the V switches lies in the range covered 
by the Max and Min settings, then the Max and Min swingers 
will not be connected. If the V number is outside this range 
then the Max and Min swingers will be connected. If the Max 
and Min swingers are connected, the operation of R closes a 

path to operate H and start reading in a new function imme- 

It is necessary to use five relays - V-^, V 2 , V^, 
V rt , and V-j^-to represent all of the numbers from to 16 in- 
clusive, but there were only four holes readily available on 

the tape for reading into these relays. Consequently four 
of the relays are read into directly through the steering 
relays, and a special artifice is used to get the fifth digit 
stored in 

Since the only case in which this digit equals 1 
is when the number of states is 16, and all the other four 
relays are released, this relay is operated through the back 
contacts of V lr V 2 , V^, and Vg in series. But since V-^, V^, 

V^ p and Vg are also all released when the number of states is 

0, a contact of Mq is also included in the operate path, to 

distinguish between these two cases. 

Without the short-cutting features the average time 
of solution for a completely specified function would be over 
an hour; with short cuts it is about five minutes. 

Indicating Circuits 

A set of indicating lights is provided which shows 
the permutation and negation that must be applied to the tape 
function (when a solution has been found) to transform it into 
the function on the input switches. The eight negating lights 
are connected in a simple fashion to the W, X, T and Z coils. 
If the W relay is out, for example, the W* lamp lights up by 
a current through the W coil (not sufficient to operate the 
W relay). If the W relay is operated, the W lamp lights up by 
current through the W w contact. 

The circuit for the permuting part is more complex. 
However, on tracing through. the circuits it will be found 
that the lights always receive proper voltages to indicate 
the permutation set up on the A, B, C, D, E relays. For ex- 
ample, in the first (identity) permutation^ A, B, C, D and E 
are all operated. It will be seen that the eight center 

foints between pairs of lamps receive the following voltages: 
indicates floating) 

+ « 
+ - . 

Hence the diagonal series of lamps 

- 19 - 

W - - - 

- X - - 

- - Y - 

- - - Z 

will be lighted. Note that the lamps connected to floating 
points receive half voltage by a sneak path through the two 
lamps in series. This is not sufficient to illuminate them 

Another permutation indicating light circuit has 
been provided for trouble shooting and for better observation 
of the machine while in action. This consists of twenty-five 
small neon lamps. Twenty-four of these correspond to the 
twenty-four permutations of the variables. These are ar- 
ranged in a rectangle six wide and four high. In operation 
without short cuts, these lamps light sequentially from left 
to right across the first row, then across the second, etc. 
In short cuts due to the Fq and F^^ relays the whole pattern 

of twenty-four permutations is skipped. In short cuts due to 
F^ and Fg a horizontal row in this display is skipped (only 

the first lamp of the row going on). 

The circuit controlling these lights consists of a 
tree on relays A and B which selects the row and a second 
tree on C, D and E which selects the column. Only the lamp 
at the intersection point will go on. Sneak paths through 
other lamps all involve at least three lamps in series and 
the voltage is not sufficient for breakdown of such a series 

The twenty-fifth lamp is connected to light up if 
the C, D and E relays get into either of the two other pos- 
sible states which do not correspond to permutations in the 
regular sequence of operations. It can thus indicate certain 
trouble conditions. 

Other Modes of Operation 

With the mode of operation switch set in the P pos- 
ition (periodic), the machine does not advance the tape after 
the sequence of permutations and negations but periodically 
goes through the tests on the function in the memory. In this 
switch position the path to the H relay, which ordinarily ini- 
tiates the tape reading process, is open. This mode is some- 
times useful for trouble shooting. 

- 20 - 

In the S position ( step~by~step) , the machine tests 
a permutation and then stops until the Run switch is operated 
and released. The path which normally puts ground on the relays 
F^, is opened and replaced by a contact on the Run 

switch connected to a condenser. When the Run switch is off, 
this condenser charges, and when pressed for a step in the oper- 
ation it discharges through F . F^ or Only enough charge 

is stored to operate these relays once. For the next step the 
Run switch must be released and pressed again. 

In the L mode (low-speed), the machine operates as 
in the normal mode except at a much lower speed. This is 
achieved in a fashion similar to the step-by-step operation 
but with the function of the Run switch replaced by relay N. 
The N relay is operated by the G relay which is connected in 
a relaxation oscillator circuit using a gas tube. The conden- 
sers charge up sufficiently to break down the gas tube which 
operates G, closing its make contact and discharging the con- 
denser which then starts recharging. This slow oscillation 
of G causes N to oscillate slowly which, in turn, allows the 
solution to proceed at a slow rate. 

In Mode Q ( self- restarting) , the machine does not 
stop at a solution but rings the gong, pulses the message 
register, and then proceeds to the next permutation or nega- 
tion in the sequence. When a solution is reached in this mode, 
the operation of relay L causes the message register to operate. 
This releases relay £ which releases the message register and 
also applies voltage to slow-operate relay G. Operation of G 
energizes N, which in turn advances the permutation sequencer 
one step and also energizes K, K locks in releasing G and in 
turn, N, and the solution proceeds. This mode of operation 
can be used to find all of the solutions to the given problem, 
rather than just the first one. 




Appendices A and B 

Photographs 214140 through 214143 

Figures 1 through 5 

- 21 - 

Appendix A 
Main Components and Their Functions 

Relays and Other Electromagnetic Components 

M_ M 




w x , w 2 , w w 4 





T 4 



Z , 






B l- 

B 2 , 

B 3' 

B 4 



D r 



D 4 





Vw W x Z x 
w z Z z> W a z a 
W c Z c» w d z d 

w b z b 

v e Z e 

Memory relays. These register the 
values of the function read off the tape 
for its sixteen possible states. If M. 
is operated, the function is closed in 
state ie 

Four parallel relays (to give sufficient 
contacts). These relays negate the vari- 
able ¥ of the tape function. This is 
done in the negating and permuting net- 
work by interchanging the eight leads 
corresponding to the variable W=l with 
the corresponding eight leads for which 
the variable W is zero. 

Similar negating relays for the variable 

Similar negating relays for the variable 

Similar negating relays for the variable 

Permuting relays. The function of these 
relays is to permute the sixteen lines 
from the memory relays according to the 
various permutation of the variables 
W, X, Y and Z in the tape function. By 
suitable combinations of operation and 
release of these five sets of relays, 
the interchanges corresponding to any 
of the twenty-four permutations are pos- 

WZ relays arranged in a counting circuit 
to go through the 384 permutations and 
negations applied to the sixteen leads 
in the permuter. These WZ; pairs control 

the preceding W, X E relays, thus 

W 1 , W 2 , Wj, are controlled by the 

relay of the ¥ w pair. 

- 22 - 

Appendix A (Continued) 

F 0» F 7» F g» F 15 Failure relays. Operation of F Q , for 

example, corresponds to failure of the 
permuted line coming into switch to 
match the value on input switch Iq. 

Operation of a failure relay causes the 
machine to proceed to try another per- 
mutation or tape function. 

F !> F 2 These are failure relays which are op- 

erated by a failure to match on any of 
the other switches not taken care of 
specifically by F Q , F 7 , Fg or F^. 

F 3» F g» F -i6 Secondary failure relays. These are 

y operated by the preceding failure relays 

and sort out the type of short cut (if 
any) available. F^ causes the permuter 
to advance to the next negation (skipping 
all permutations of the current negation)* 
F^ causes the permuter to skip the current 
subset of six permutations out of the 
twenty-four, advancing the AB part of the 
permutation one unit. F^ causes an ad- 
vance of only one in the permutation. 

a » R i» R 2' fi 3» R s These relays are controlled by the five 

fingers of the tape reading mechanism. 
For example, a hole in the 2 row of the 
tape operates R 2# Rq, R^, R^, R~ carry 
information to the memory relays Mq, 
and also to the number of state re- 

la 7 s Yi> v 2» V 4» Y B° E s marks the end 
of data relating to one function on the 
tape . 

S l* S 2* S 3* S L Steering relays. These relays steer, 

by means of four trees, the tape read- 

ings on Rq, R^, R 2 , R3 into the memory 
relays and the number of state relavs 
V l> V 2» V V 

- 23 

Appendix A (Continued) 

V»r \2 z -2. S^^^fi^rSUlHr. 

sequence the steering for successive 
lines of tape into the appropriate 
memory and number of state relays. 

V,, V" 2 , V , Vg, V l6 Number of state relays. These relays 

* register in binary form the number of 

states for which the function currently 
in the memory relays is closed. 

W S Z S A WZ pair for operating the card dis- 

play unit. It causes successive func- 
tions on the tape to operate alternately 
the right and left solenoids S r and S, 
of the display unit. 

S r , Eight and left solenoids of the display 

unit for releasing cards one by one 
from the stack. 

H End-of-permutations relay. This oper- 

ates when the machine has tested all 
permutations of the current tape func- 
tion, and initiates analysis of the 
next function on the tape. 

I» Success relay. This operates when the 

machine finds a solution to the prob- 

Q Short cut eliminator. "When operated, 

this relay eliminates short cuts in the 
premutation sequence. 

J A delaying and checking relay in the 

basic closed loop of the system. J 
operates when all of the WZ pairs in 
the permutation counter are in agree- 

SO Slow-operate relay in a buzzer circuit 

for producing pulses to step the tape 
via relay U. 

U Secondary relay operated by SO. 

- 24 - 

Appendix A (Continued) 

Reed relay in a slow relaxation os- 
cillator circuit for controlling low- 
speed operation via secondary relay H. 

Secondary relay controlled by G. 

Control relay relating to low-speed and 
self -restarting modes of operation. 

Message register for counting solutions 
to a problem. 

A relay for connecting the 110 volt 
supply only when the 24 volt supply is 

A bell operated by L which sounds when 
a solution is found. 

A five-hole teletype tape transmitter. 
The standard functions are arranged on 
tape in order of increasing numbers of 

Appendix B 
Manually Operated Switches 

Problem input switches. These switches 
have three positions, "open," "don T t 
care," and "closed," and are set to cor- 
respond to the desired characteristics 
of the circuit to be designed in its 
sixteen states. 

Mode of operation switch. This is a 
five-position switch which determines 
the mode of operation of the machine. 

In clockwise order these modes are: 


P = Periodic. It continues cycling 

through the same permutations with- 
out advancing to the next function. 

Q = Step-by-step. In this mode the 
machine tests the permutations one 
at a time under control of the key 
switch. This switch must be pressed 
once for each permutation. 

N = Normal operation. Runs at regular 
speed to the first solution and then 

S = Self -re starting. At each solution, 
it rings the gong and adds a count 
to the message register, and then 
advances to the next solution, 

L = Low-speed. Similar to normal, but 
at low- speed for demonstration and 
test purposes. 

Short cut eliminator. In the "On" po- 
sition this switch operates relay Q 
and eliminates short cuts in the per- 
muting sequence. 

Next function button. Pressing this 
pushbutton operates relay H, causing 
the machine to advance to the next 
function on the tape, omitting any re- 
maining permutations of the current 

- 26 - 

Appendix B. (Continued) 

Starts the machine operating by 
closing its fundamental operating 
feedback loop. 

Turns power on for the machine. 

Both of these switches have seventeen 
points labeled, 0, 1, 2, 16; the 

Min switch has an additional point 
labeled "Normal". In use, the Min 
switch is set at the number of states 
for which the function to be designed 
is closed. The Max switch is set at 
this number plus the number of "don f t 
care" states. The machine then skips 
functions from the tape whose number 
of closed states do not lie in this 
range, thus shortening the solution 
time. If the Min switch is set at 
"Normal" this shortening feature is 

April 3, 1954 

Both experience and intuition suggest that a function 

of time f(t) which is bounded in amplitude range ( |f (t) |<A) and 
in bandwidth (the spectrum vanishes for angular frequencies 

etCo, and that there is a certain minimum time required to go 
from a maximum negative to a maximum positive amplitude. In- 
deed, one feels that the maximum slopes, and higher derivatives, 
and the fastest rise times will occur with a sine wave having 
the highest allowed amplitude and the highest allowed frequency. 
This note establishes some theorems of this general sort. 

Theorem I : Let the function f(t), of integrable 
square, be both amplitude limited and band limited: 

|f(t)|<A all t 

greater than <a Q ) has bounded slope, a bounded second derivative, 

F(») - 

where F(«) is the Fourier transform of f(t) Then 

f»(t) < A« 
f"(t) < A« 


all t 

f^t) < Ao) ] 



Proof ; If we can prove the theorem for a particular t, 
it will follow for all t r since we can shift f(t) along 
the time axis without affecting the assumptions of the 
theorem or its conclusions. We will prove the theorem 
for the particular time t^ - Now apply the sampling 

theorem of f(t), expanding it in terms of its samples: 

f(t) - 2 aj, sin Sa£ 
-oo <i) t-nn 

ft(t) m °P ; [<o ((o t-nn)cosco t - <o sinco t] 

-oo 2 
(w t - ntr) 


since the absolute value on a^ makes all terms positive. 

Now is the value H£ f (t) at t - §2 ^ consequently 


l^l 5 A » Hence 

o ~ **{n-l/2)< 

±Zfl 2 1 

( n -l/2)2 

This proves the desired result for the first derivative. 
The results forl.higher derivatives can be obtained 
inductively, f» (t) is band-limited, of integrable square, and, 
as we have just shown, amplitude limited to Aai Q , Hence, f" 
will be amplitude limited by: 

f»{t) < (Aw o )<o - Ao) Q 2 
and by obvious induction 

f< n) (t) < A£0 o *> 

It will be noted that these bounds are the maximum 
derivatives that would be obtained for a sine wave of the 
highest allowed amplitude and frequency, f(t) « A sin o> o t. 
While such a wave does not satisfy our integrable square as- 
sumption, it is possible to approximate the bounds given as 
closely as desired by taking a sine wave of nearly top fre- 
quency and nearly top amplitude and multiplying it by a very 

slowly decaying function of the type s *** kt (k very small), 


This produces a function satisfying all the conditions with 
maximum derivatives approximating to the upper bounds given. 
Consequently these bounds are the best possible. 


We now consider the problem of total rise of a function 
over an interval. Again we would conjecture that the shortest 
time for a rise from negative peak to positive peak amplitude 

would be obtained by use of a sine wave of the greatest allowed 
frequency and amplitude and hence would be nto Q seconds. We have 
not been able to prove a result quite this good but will show the 

Theorem II : Under the same conditions on f (t) as in 
Theorem I, it takes at least 3 1/12 w seconds for f (t) to 
change from -A to +A. 

Proof : We will show that if f(o) - -A, and f(t 3 ) - +A, 

then f 1 it) for < t < t_ lies always under or on the 

~ ~ 3 

curve g(t) shown in Figure 1, This curve consists of 


five sections, a straight line segment of slope Au3 Q , a 
parabolic segment whose second derivative is -Aa) ^ and 
which is tangent to the first segment and to the third 
segment, a horizontal straight line at height Ao) Q . The 
last two segments are reflections of the first two. 

In the first place, if f(o) - -A, then f'(o) - 0, 
for f (t) is an entire function because of the band limita- 
tions, and if £} (o) were not equal to zero, f(t) would run 
outside its amplitude limit A in the neighborhood of zero. 


f»(t) - f»(o) + J f"(t)dt 

< + J |f«(t) |dt 

< Aw 2 dt - A» 2 t . 

- 5 - 

Hence f 1 (t) lies under or on the sloping straight line 

section. Also f»(t) < AVjj^so it lies under the horizontal 

segment. Next we show that it cannot lie in the small 

triangular shaped region T. Suppose in contradiction 

that f 1 (t) did lie in this region, passing through a point 

p at t - t as shown. At t Q we have either (A) f"(t o ) > g T (t ) 

or m f°(t ) < g'(t ). 

Assume first case (A). We may write 

t 2 t 2 

f»(t 2 ) - f'(t Q ) + (t 2 -t ) f»(t ) + J I f«'(t)dt dt. (1) 

*o *o 

We also have 

t 2 

g(t 2 ) - g(t ) + (t 2 - t Q ) g«(t ) + J J g»(t) dt dt. (2) 

The three right-hand members of (1) dominate the corres- 
ponding members of (2). f»(t e ) > g(t ) since we assumed 
f»(t ) in the triangular region. f?(t ) > g»(t e ) since 
we are assuming case (A). f m (t) > g"(t) since the g curve 
has the greatest negative second derivative allowed by 
Theorem I. We conclude that f'(t 2 ) > g(t 2 ), and the f» 
curve is over the horizontal line at t 2 , a contradiction 
which excludes case (A). 

A similar argument applies to case (B) working back- 
ward to the point t^« In equations (1) and (2), read t± 

for t 2 and notice that the coefficient (t 1 -t Q ) now becomes 
negative • This allows the same argument to go through with 
the condition reversed on the relation of f"(t ) and g T (t o ), 
and the resulting contradiction excludes case (B), which 
shows the impossibility of a curve in the triangular region. 
An exactly similar argument working backward from t 

shows that f»(t) must lie under or on the right-hand sloping 
line and curved segment. Now if f»(t) is always under g(t) 

under gH). In order that f(t) run from -A to to +A at t^ 
the area under f « (t) must be at least 2A and hence so must that 
under g(t). A simple integration of the g(t) curve shows that 
this requires t 3 > 3 1. This proves the desired result. 

It would no doubt be possible to improve the value 
3 ^ by more elaborate arguments of the same general type, 
finding better g(t) functions with properly banded values of 
g m (t), g iv (t), etc. It seems difficult however to obtain the 
conjectured value by this method, 


Fig. i 

e.f -5. 

Bell Telephone Laboratories 

Cover sheet for Technical Memorandum 

subject: Concavity of Transmission Rate as a Function of Input 
Probabilities - Case 2067o* 




2 - 


W. Bode 

3 - 


R. Bennett 

4 - 


S. Black 

5 - 


A. DeSoer 

6 - 


N. Gilbert 

7 - 


E. Graham 



W. Hagelbarger 



L. Kelly 



P. Lloyd 



A. MacColl 






F. Moore 



R. Pierce 


0. Rice 




mm- 55-114-23 
date June 3, 1955 
author C. E. Shannon 


JTH'S COPr f 0R 

Information Theory 


The following theorem is proved: In a discrete 
noisy channel without memory the rate of transmission R 
is a concave downward function of the probabilities P^ of 
the input symbols. Hence any local maximum of R will be 
the absolute maximum or channel capacity C. 

Concavity of Transmission Rate as a Function of Input Probabilities 
- Case 

MM-5 5-114-23 
June &, 1955 


Theorem : 

In a discrete noisy channel without memory, the rate of 
transmission R is a concave downward function of the probabilities 
P i of the input symbols. Hence, any local maximum of R will be 
the absolute maximum or channel capacity C. 
Proof ; We have 

R = B(y) - H x (y) 
- -2 Q A log Qi + 2 

where the Q.^ are the probabilities of the various received symbols 
and a£ is the conditional entropy of the received symbol when the 
transmitted symbol is the i-th one. 

A condition for concavity of R is that — = R.. 

be a negative semi-definite form.* We have 

|f - -f 1 ♦ log 9i ) Pj U) ♦ a } 


using the fact that Q i = Zp^p^i). 

H<v - ~Z - i p,(i) p fi) 

*See "Inequalities, " Hardy, Littlewood and Polya, Cambridge 1934, 
p. SO. 

2 R AP.AP. = -2 2 Ip.(i) p (i) AP AP 
jk J J 1 ijkQi J k j k 

,-2^(2 P .(i)AP.)(Z p k (i)AP k ) 


- j£Si. 
iQ i 

This displays the sum as necessarily non-positive, since 
all terms are non-positive, and consequently shows that R^ k is 
negative semi-definite and R a concave function. The simplicity 
of the formula (1) for the second derivative of R in an arbitrary 
direction is quite striking. 

A corollary to this result is the following! Consider 
the set s of points (P lf ? 2 , P Q ) with 2P i - 1 for which R 

has its maximum value. Normally, of course, there is only one 
point in the set, but in other cases it is not so limited. Our 
theorem allows us to deduce that s is always a convex set of 
points, for if R is maximized at (P^, P n ) and also at 

(P», P f ), it must clearly have the same value at (aP + 

in - 1 - 

(l-a)PJ aP n + (l-a)Pjr). 





~°ws j wiT 





° 1 

T T 


{Eh * 

KEh * 





1 3 a 

9 - 5 , ,, 11 9 


Z M, Rj Z«! 





110 v DC 

NE 2 



"5] "51 ol "5] "5 





Fig. 2 

B- 362338 










o J- 


^ c 

d .2 c 


O- 'Z 

— •< — 

u i; a) 


3 d u 

E o 1 .- 
u <y O 








TI T.*., Tune. 30,H5^} 

B- 362340 

*- o 

QT CC or , or cr 


o o 

o o 

o o 




8 I°H 

£ >< > £J 







£ X > N 

• — i 


"D.T ft., Tune 30, \S54 


" & 








CARE ' - ' 




• « m 


\ T0 - 


Fig. 5 

D.T.A., June $0,145*4 



The material in these notes has not for the most part been 
published and is for personal use only. The notes are not complete. 
Several key sections are not yet available, consequently there are a 
number of forward and backward references which are quite meaningless. 

The remaining sections will be handed out as soon as avail 

The parts of the notes now available are not arranged in the 
correct order for easiest reading. The following rearrangement of sec- 
tions should be made: 

Some Useful Inequalities for Distribution Functions - p. la - 3a ^ 

A Lover Bound on the Tall of a Distribution - p. ly - 9y u-^ 

A Combination Theorem p. lm I — 

Some Results on Determinant s p. lb - 3b 

Upper and Lower Bounds for Powers of a Matrix with Hon-negatl,ve Elements 

The ffumber of Sequences of a Given Length 
Characteristic for a Language with Indepedent Letters 
The Probability of Error in Optimal Codes 
Page with figures 1, 2 and 3 

Zero Erro r Codes and the Zero Error Capacity p. I4- 6g ^ 

Theorem p. lh - 3b. U<- 

Figure 4 

Lower Bound for P pf for a Comp letely Conne e^* Ch«nn ? T yi^ 

p. 2r - 3r 

ad for f & p. lk - 5k 

Application of ■Sphere-packing" Bounds to Feedback Case - p. lp - 3p 
Theorem p. lq - 4q^ 

Theorem p. 1J ^ 

A Result for the Hemoryless Feedback Channel p . i r \^ 

Continuity of P p ppt as a function of transition probabilities - p. le 

Codes of a fixed composition p. If 

Relation of P^ to n . It - 2i 

BpUBl or P g for Random Code by Simple Thres hold Argument - si - eki^ 

A bound on P e for a random code p. Id - 3d ^ 

- 2 - 

The Felnstein Bound pages 11 & 21 

Relations Between Reliability and Minimum Word Separation - p. l2 ( 22 , 62 & 72 

Inequalities for Decodable Codes p. In - Jn 

Convexity of Channel Capacity as a Function of Transition 

Probabilities p. lo L*-" 

A geometric Interpretation of Channel Capacity p. lx - 6x ^ 

Log Moment Generatin Function for th» Sqpm -e of a 

Quassian Yariate p. p 1 - £2 L- 

TTppar Bound oix for Gauss ian Channel by Expurgated 

' Random Code p. si - f2 

Lower Bound on P^ in Gaussian Channel by Minimum 

Distance Argument p. al - a2 " 

The Sphere Packing Bound for the Gaussian Power 

Limited Channel p. c 1 - e 5 

The T-terminal Channel p. .fl - 67 

Conditions for Constant Mutual Information p. 1066 

Simple Proof p. 1024 

The following errata have been found: 
p. ly line 10 > 1 

line 11 for any positive <^ 

line 14 ^(1 - e p "- 

p. 2y line 8 V, <Y 2 <. . . . 7 % 

p. Jw - lines 1. 2. 4, 7, 8, 9, 13. 17 subscripts on $ should 
be in line. 

p. 2c - line 7 * log Prob 

n p 

4c - Eq. (7) E( 8 ) - -^(s) log - - (ji - su«) 

Eq. (8) R(s) = £^(8) log q i (s)° 1 » n - («-l) 

line 6 dE , dR ^ - n' + six" + n' - . s 
ds ' ds * n 1 + (1-s) u M -u' ~ 

line 2 E(l) « j log p^ 1 + log d 

- 3 - 

page 3| - line 3 - log min. jT 

page J*g - line 9 change mar. to min. 
Fig. 4 bottom line - change 3 to 2. 
page 5K equation (l) min 

V 1°U = 1 

I would appreciate knowing any further errors of any sort that 
are found in the notes. I expect there a good many there. I wculd 
also be interested to know of any parts that are particularly difficult 
to follow and perhaps need rewriting. 

Claude E. Shannon 


Bounds or- the Teiis of Martingales and delated Questions 

Claude B. Shannon 
Department of Electrical Engineering 
Department cf Mathematics 

Eeseareh Laboratory of Electronics 
Massachusetts Institute of Technology 
Cambridge, Massachusetts 

This paper is concerned with the problem of overbcunding the proba- 
bility that the sum of n dependent random variables exceeds a certain 
quantity. Certain restrictions are assumed concerning the distribution 
of the ith random valuable :n conditional on the preceding random var- 
iables. As an example, v;e might have a gambler plgying some * system K 
in v/hieh m is his winning cn the ith bet. Suppose he can choose any 
distribution he desires for x i conditional on the preceding plays, 
-"'^i i 3: j-~ " " i-Z' subject however to the conditions 1) it is a 

K fair K bet, S(x. !x.._, , . . . , r^) = Oj 2) there is a R house limit" on 

passible wins or losses for one bet, . . .,x, ) = for 

< L and Pixja^, n^ gt . x^ * 1 for sc.. S> W. It is desired tc find 
an upper bound on the probability that the gambler's winnings will exceed 
a certain limit X after n bets. This bound will of course be a function 
of L e Y: s n and K but is to be independent of the system used. 

Thought of another way, we can imagine the gambler mapping out a 
strategy, subject to the house rules, to try to maximize the probability 


of ending up after n bets with a total winning of X or more. If this is 
his object, he would clearly be wise, for example, if he ever reached 
the level X to not risk any future loss. This he could do by choosing a 
distribution function thereafter which is for negative s and 1 for 
positive s. 

We will find a bound for this problem and various other similar 
problems with different side constraints on the allowed distribution 
functions. The results have applications in various problems related 
t- random walks, gambler's ruin problems and certain coding problems 
in information theory. 

In the example above, the gambler's total capital forms a martingale 
because of the R fair bet" condition. Bounds on the tails of martingales 
are known in terms of the variances of the successive amounts won. 
The bounds we obtain are in terms of conditional moment generating 
functions. As such, they require more in the way of restrictions on 
the distributions' (for the moment generating functions to exist), but 
give tighter bounds. Our bounds bear the same relation to the variance 
type bounds for martingales that the Chernoff bound does to the 
Chabycheff bound for sums of independent random variables. 

The Main Inecuality 

The method we use is based on a bound for the tail of a distribution 
due to Chernoff^'. Lei P(x) be the distribution function of a random 

e S:: dP(x) 

exists ever some % interval including the origin in its interior. This 


will certainly b'e true, for example, if P(x) < e E:: for some a > and 
sufficiently large negative x, and 1 - P(s) < e for Some positive b 
2nd sufficiently large positive x. 

We first derive a somewhat generalized formulation of the Cherncff 
bound. Let u(s) * log v(s) be the semi-invariant generating function. 

Lemma 1; Suppose the semi-invariant generating function {i(s),for a 
random variable x, exists for & < s < b and does not exceed another 
differentiable function of s t ^(s). Thus f /.£s) * !-Us). Then 

fi (s)-S:- f {s) 
Pr[:^r,y s )l « e ° " ° b^s>0 

r-r[;:^( S jj <s e ° ° -e « s « G 

This result is like the Chernoff bound except for replacement of u(s} 
by an upper bounding function ^(s), and may be proved by similar means. 
Thus by the generalized Chebycheff inequality 

s y / * cc 


X Pr[x5*X] « f " e S "dP(x) s * 

: f°° c- sx dP(x) = v(s) = e^ s} 



*e ° 

his is true for any X. Set X = h£(s). Then 

e ° 

A similar argument gives the dual inequality for negative ». 

We now develop a formula for the momeat generating function of the 
sura of c set of dependent random valuables } x = X] * ^ + . . . f ^ , vhere 
the distribute function of r_,, ..., Zr is given by 

P(z I' V ' ' * ' *n } " F *i^ 2 . s 2 *«y .... s. r <aj 

It is cs assumed that for this multivariate distribution the moment gen- 
err irz Z functus for various random variables conditional on others 
euisi. To avoid notations! eomp-emty we carry out the only 
for n - *, using ;:, y and * for the three random variables, but the 
method is clearly general, id v(s) is the moment generating function 
for the sum variable u « s + y * 2 , then (all integrals are from -co to »); 

= / eS:: dP(r) J dP V|^3 j* e SZ dP(s| Xj y) 

The innermost integral is the moment generating function for s condi- 
tional on s and y. and may be denoted by v^.y) (the 3 referring to 
the third variable, z). Thus 

Suppose now that we have a bounding function for ^(efx.y), say Y ( s ). 


independent of x and y. 
v 3 (s|x,y)< Y3 ( S ) 

Then the innermost integral may be bounded by ^(3) and .Ms term 
taken out of the integration. is ciearly non-negative. being an 
expectation of e Sz .) Thus 

Ws)^v 3 (s) J e Si dP(:0 Je S y d P(y[ x ) 

Similarly, suppose the moment generating function of y conditional on 
x is bounded by y (s) 

v 2 (s|x)= j" e ^dF( 7 J x )^ Y2 ( S) 
and the moment generating function of x is bounded by Yl ( s ) 

e Sx 

dP(x) < Yl ( s ) 

Then these may also be ,sed to bound the integrals, giving 
WiJ « Yj(s) v 2 (sj y 3 (s) 

Taking logarithms the semi-invariant generating function u(s) for ' 
the sum variable u is therefore bounded by the sum of the logarithms 
of the v<s) functions, iiat is, by uniform bounds on the conditional semi- 
invariant functions fo the different variables 

l4s) £ ^(s) t ,i 2 (s)+ ^(s) 


The same argument carries through for the sua of any number of ran- 
dom variables and may be summarised as follows. 

Lemma 2: The semi«invariant generating function jj.(s) for the sum 
of n random variables is bounded by 

where ^(s) is a uniform bound on the Semi-invariant function for the 
ith random variable conditional on the first i— i; 

f sx. 

log J e 1 dP(x. |s lf s 2 , .... s^j) * (j..(s) . 

In most applications the same bound, say p. Q (s) s will apply to all 
the random variables. In this case ^(s) <S nti Q (s). Combining Lemmas 
1 and 2 we obtain our first main result, a bound on the tail probability 
of a sum of dependent random variables provided the conditional moment 
generating functions exist. 

Theorem 1: If u is the sum of n dependent random variables 
Xj(i*l, Z, . . . , n) whose semi -invariant generating functions conditional 
on preceding variables n^sjxj, .... exiist and are bounded by dif- 
ferentiate functions ^(s), (i=l # 2, . . n) then 

Pr[u*Su|(s)j « e 1 1 s * 

Pr[u«2|^(s)3 ^ e x 1 s < 



In applications of this result we would generally attempt to find the 
smallest bounding functions ^(s) in order to obtain the tightest bound on the 
tail probability. As a first example consider a gambler allowed to choose 
a wager with an arbitrary distribution function ctfx) (the probability of 
gaining x or less), subject however to the following conditions: 

1) The expected gain is 2ero. J" xd$x) ~ 

2) <Kx) =s ^(x) where ^(x) is a distribution function with negative 
mean for which J% Sx d^(x) exists for some negative s. 

3) <Kx) 5> <> 2 (x) where <j> 2 (x) is a distribution function with positive 
mean for which /e sx d<|> 2 (x) exists for some positive s and ^(x) < ^(x). 

Thus our gambler is allowed to choose a distribution function at 
each wager lying between two given curves ^(x) and ^(x) t (as suggested 
by Fig. 1) 

Fig. 1. 

which approach and 1 with a certain rapidity. He is also constrained 


to choose a distribution function with zero mean. The situation described 
earlier involving house limits is a case of this type where the distribu- 
tions $j and 4> 2 are step functions at L and W, the maximum allowed 
loss or win per wagar. 

To apply the theorem we need a function which bounds the moment 
generating functions which he can achieve with these restrictions. Con- 
sider the distribution function A (s) defined as follows: 

t> G (") ~ $j(x) x < a 
<?> (x) ■ k a =S ^ p 

♦ D U) = 4 2 (x) x > p 

where a is the first point at which ^(x) reaches the value k and (3 is 
the first point at which <J> 2 (x) reaches k. tfx) is a distribution function, 
and by adjusting k we can clearly make the mean of the distribution <j>(x) 
equal zero. With this value of k we will show that the moment generating 
function for any allowed ${x) is bounded by that for A (x). 

Since $(x) and <|> o (x) have the same mean (namely zero), we have, 
integrating by parts, 

o = f x d(* o (*H<*)) = ^(xHKx))] 00 - f°° 4 ( x )-cKx})dx 

-00 e/~co \ / 

dx = 

where we use the exponential approach of * and 6 q to and 1 as x goes 

to -co and -fco to insure the vanishing of the term 4* UH(x» at these limits. 


Now consider the' quantit-f f a ~H« „ c . 

q 2 - Us -in using integration by parts) 

£ °" «*«HM> - - • f e- to 


-a « s « b 

a md b ^ e shs iimi££ of ^^^^^^ ^ ^ 
'unctions an, a is *. «r St paiat 8t ^ ^ ^ ^ , 
horizontal se^nt of ^, ^ ^ foy ^ ^ ~ ^ 

;7 tly * (w — v?- £ - »>•• - « u ( or ,. ro , 

I he first terrn -s / «,s*r.L r i , „ 

. r 6 J-o is greater than or equal to 

_ A* S **J*H<*ndx. since, when s is positive, e * 5 > e s * for 

< x < 6, $ - $ is positive and the coefficient 

- Y UJCien£ s » negative. If s 

pos ltl ve. fa „ stoUar way _ lhe aecQnd . 

* neater than or equal l0 _ 3 p ^ " " J 6 ^ ^HMl <* 

J 6 IV 1 '-^ 2 )! as one verities by 
examxnation of the two cases s » o and s < „ „ • 

Q s * remembering that 4 ( x) - 
» native or ,.ero in this range. Thus we concha ° 

= - se 5s r 


e6S [* (xh«(s)J dx 




In other words, the moment generating function for the distribution 
6 (x) dominates that of any other distribution with the same mean as <j> 
and bounded by the ^ and <j> 2 curves. Therefore the moment generating 
function for A may be used in our bounds for the tail of a sum distribu- 
tion if the individual conditional distributions satisfy this type of restric- 

Using this bound on the conditional moment generating functions in 
Theorem 1 our solution may be summarised as follows. Suppose at each 
play of a game the distribution functions available to a gambler all have 
zero mean and lie between two functions 6j(x) and d> 2 (x). Let 4> Q (x) be 
the zero mean function consisting of 4>j followed by a flat segment, 
followed by 4> 2 . Let 

yis) = log J° e 3x d $ o (x», 

Then the probability of his winnings after n wagers exceeding n{x»(s) is 

Pr[u»nu«(s}J < e n[fi(s) " s ^ (s)] s » 

This same bound applies, of course, also with a semi-martingale 
condition, that is, if the gambler's expectation is only required to be 

If 4^(0) ■ 1 and <j> 2 (0) = (so the gambler can play a wager that amounts 
to stopping the game, that is, a distribution which is a unit step at zero), 
then this same bound applies to the probability of exceeding nn'(s) on any 
of the first n trials. This is because the bound covers ail strategies. 


Any particular strategy could be modified so that if the gambler reaches 
the level nfi'(s) at any time before the nth trial he then effectively holds 
his winnings by playing the distribution with unit step at zero. The bound 
must exceed the probability of exceeding the level njx s (s) for this strategy 
at the nth step but this is a bound on the probability of ever exceeding the 
level in the first n steps. This device can be used in many applications 
of the method we ar? describing, provided only that the unit step at aero 
is an allowed distribution function. 

The bound given, while certainly not the best possible for all values 
of the parameters, is, however, best possible in the coefficient of n in 
the exponent. That is, the result would be false if u<s) - su»(s) in the 
riyht hand exponential term were replaced by \i(s) - sjj. 6 (s) - € for any 
positive €. This may be seen as follows. The gambler could, within the 
rules, choose the distribution $ o (x) at each wager. If he does so, then 
we have a sum of n independent random variables, each with semi- 
invariant generating' function u(s). Lower bounds on the tail of this sum 
distribution are known to exceed ^rf'Hu^H] when n is sufficiently 

The Case with House Limits on Win or Loss for each Wager 

For the case of the gambler who can choose an arbitrary distribution 
with sero mean and house limits on wins and losses W and L (L<0) 
respectively, the distribution to maximize ji(s) is, from the above analysis, 
a binomial distribution with jumps at the ends of the interval W and L 
adjusted to give a zero mean. The two probabilities are W W T at L and 


To gain a little in generality and siinpiixy notation, consider a binomial 
with probability p at values L and probability q * {l-p) at W. The semi- 
invariant generating function is 

n(s; = log (pe ?L *qe sW ) 

_ T sL , „. sW 
pLe + qWe 

uH&\ = 

pe * qe 

The expression for the bound on the tail may be simplified by a change of 
variables eliminating s. Let 

na s>L 

X a 

pe SL + qe sW 


t] 1 - \ = 

sL sW 
pe * qe 


A * L c s{L-w, 
i q 

i *q 

H'(s) = XL * t]W 


u - ap?(a) = log (pe sL *qe sW ) - s(XL^W) 

= log ( pe a ^qe svv )- 

L - W lQ g pi. 

p q 

= X log ♦ 11 log — 

Xq (XL^W) Xq 

Letting p equal ^— and q equal ^rx" and using our result bounding 
the tail of the sum of n random variables, we obtain the follow ing bound 
for the probability of the gambler exceeding a certain level after n wagers; 

W (IF f 

Pr[u»n(XL*T|W)] <c 

W - L 

X » pi tj = 1 - X 

If L = -W, that is r the win and loss limits are the same, this formula 
can be simplified somewhat at the expense of a certain weakening. It then 


Pr[u»nW(l-ZX)3 < 

"-X -Tf n 
X ti 1 

Let x* |(i+e} f n = -ki-e). 


Pr[u>nW9] * [(Hwef( 1+e )( 1 -e)-(l-e)]n/2 

-|[(l+e) In (116)1(1-8) In (1-6)] 

83 e 


Consider the bracketed term in the exponent and expand the logarithms 
as series. 

[(1+0) In (lfe)-f (1-9) In (1-8)] * (l*0)(e - ^ * ^ - + ) 

\i o,y a 2 3 4 . . „y 

e 4 _ e° \ 

*\ 2 4 6 " "7 

,f 9 z , e 4 . e 6 , 

Q 2 e 4 e 6 e 2n 

' b ^ 15 ^ °°° ^ 1i(2n-l} * 

*0 2 



Pr[u*nW9] « e 2 9 Ss 

It may be noted that this bound is similar to the exponential part of 
the normal approximation to the sum of n binomial samples,, probabilities 
'£ at t W p without, however, the coefficient term that would ordinarily 
appear. This might suggest that the gamblers best strategy to maximize 
the probability of exceeding nW8 would be to continually play the extreme 
binomial distribution, or at least until he was within W of it and then 
switch to a binomial which would just carry him oyer the limit if he won. 
While this appears to be a rather good strategy, it is not quite optimal 

v ■ 


as a study of small n values reveals. Determining the optimal strategy 
appears to involve considerable combinatorial complexity. 

The Probability of ever exceeding a Limit with a Negative Expectatio n 

Suppose now that the conditional expectation of all wagers Is negative 
and we are interested in a bound on the probability of ever (in an infinite 
series of wagers} exceeding a certain (positive} value. If the expectation 
were srero. then by well known results m the gambler's ruin problem the 
only bound is unity, provided, for example,, the gambler can play a binomial 
distribution. With a negative mean, however,, significant bound? can be 
obtained as follows. 

We consider the case again where the allowed distribution functions 
must lae between two given distribution functions «t>j(x) and 4» 2 (x? but now 
must have a mean m < 0. The maximum n(s) is obtained by the same 
construction using ^ and <$> 2 „ but with a placement of the horizontal seg= 
ment to give the mean m. 

If 4>(0) is 1„ then 4> 2 (0) must have been 1, and no allowed bet whatever 
will ever give a positive return. Thus clearly the probability of ever 
exceeding any positive bound is *,ero. We will therefore assume that 
<K0} < 1. This assumption also excludes <Kx} being a unit step„ since the 
step would have to occur at the negative number m making 4»(0) equal I. 

Under the assumption $(0) < 1, the \i(s) curve has the general form 
shown in Fig. 2. 


The curve Is convex downward; it passes through zero at s » © with 
a negative slope m; it has a unique minimum at s = Sj (say), 1 ; and passes 
through ?.ero again at s q > s y These facts follow readily from the rela- 

= J d<Kx) 

xe sx d<|>(x} 

vis) « J* 
r*(0) = J 

V(b) = f x 2 e sx d<Kx^ jx^s) . vCs) gfaj - vis) 

xd<Kx) ■ m 

nts) * In v(a) 
jt(0} * 

»x1s) * ^ 

fi 6 {0} * m 

The numerator of u^s} is positive by using the Schwartz inequality 


(the unit step which would give zero being excluded). Hence the u curve 
is strictly convex downward. Also, for sufficiently large positive s, 
v(s) will exceed 1 and tfs) will be positive, since <j>(0) < 1. Conse= 
quently, the minimum (lats^Sj and the positive sero crossing at 

s ~ « o both exist. 

Suppose we are interested in a bound on the probability of ever 

reaching or exceeding A with the sums u, - x., u ? * x. 4- x 

1 1 Z 1 2 

^ x n « . . . . We have 

f ° • a % U " 


Prfany u >A]< Pr[u >A] 


From our above results Pr[u*A] « e n ^ s ^^ for the 8 such that 
A * n»i»(*). The particular n for which this bound is largest may be 
obtained by maximizing n[u(s)-sn'(s)] given A = nu'(s), or. in other 
words, maximising A jj^i - sj . Since »»(•) > this maximum exists 
and occurs at a unique s found by differentiation, namely, the s for which 
ji(s) = 0. This s is the s o of Fig. 2. and the corresponding n we call 
n Q . Thus s Q and n Q satisfy 

n o n , (s o ) = A 

In general, n Q will not be an integer, but the bound obtained for 
evaluation at n Q and s q certainly is greater than that for any integer 
points. Hence for any particular n. 


Now consider the Sj where ^(Sj) = (Fig. 2) and n. defined by 

Again, in general, ^ wiU not be an integer. We let, however, [Hj] denote 
the largest integer contained in n r 

Returning to our inequality on the probability of u n ever exceeding A 
we may rewrite as follows 

Pr[any u r 2* Aj «* £J Pr[u ^A] 


E Pr[u *A] + £ Prfu »A] 
n-1 [njHl 



< n,e 

-n s u 

o x o, e 

1 - e 1 

<n,e ° <>— o' + _g_ 

1 - e 

<s e 

a 1 + 

- e 


1 - e 


n. + ; — r 

Pr[any u n 3* A] == e 

-s A 

1 - e 


s A 

1 - e 1 

1 rt 

This is our desired bound. It is essentially exponentially decreasing 
in A. in fact more refined analysis can be given to show that the bounded 
term can be replaced by a more involved expression which does not increase 
with A. 


Chernoff, H. U952). A Measure of Asymptotic Efficiency for Tests of a 
Hypothesis Based on the Sum of Observations. Ann. Math. Stat. 23„ 



Some Ussful Ineq ua lities for Biatribvtion, rjaptitsis 

In this section a number or inequalities trill be riven r»Msh ere 
useful in estimating the "tails' 1 of distribution functions or' ether 
related statistics*, 

Binomial Inaqra litis? s : lat 

1 1 



GBEp-Cj^+j^) 5(^)50 . (2) 


T rhere t& ••* 1 » A, and neither /. nor ja is sero (I.'ote that if either is 
zero, G is undefined*} SincMer inequalities hold for the ftcsras of a 
binomial distributions (^»)p AK q liil , and asay be obtained by multiplying 
the above inequalities by p'^'q^o They nay also be generalised to the 
multinomial coefficient: 

1 1 

G - • ( o i 


G i e *-» ~ s G i «- (-1 12^> s tt^sti * G i <w 

T'here s is the number of comoone.nts.5~ .\. * 1 and nana of the \. vanishes „ 

a i 

The "tail" of s binomial distribution may be -estimated by the f ollosrin 


Akn-k, l ., , , 1 

( k )r q £ 7 : ~~"~7 G t (JjP-c-od»«i - => P+£ 


it (g)p k q-" ,£s fe X (^)l Voided x.p . (6) 

The first of these gives a closer estimate of the tail but is somexvhat 
more complex. The inequality (6) (Chernoff ) is often convenient because 
of its simplicity. Loser bounds for tails nay be taker, to be merely loner 
bounds for the first term as in the lower inequalities of (2) or (4) •> 

We shall n&% prove the inequalities (1) and (2). The Stirling 
approximation for nl is as follows* 

It is known that if no terms of the series are taken, nl is underestimated, 
if only the ^ term is taken, then nl is overestimated, and so on. Ke 
'fish to overestimate ni/( to) J(nn) J . This will be done if the numerator 
is overestimated and the denominator underestimated. Thus re may write 

fo-%1/2 n + 1/2 -n 1 
nl * 12n 


tf 1 1 , l i i i t 

(Xn)i(n«)i " y^?' 7%^ Cl2 ^° 12*n~ 22pn + 360( AnP + ^^3 ) 

We wish to show that the exp term is less -than or equal to one, or, which 
is the same thing, that its argument is less than or equal to zero. 
One or the other of \ 9i i is the greater. From symmetry, we may assume 
without loss of generality that it is X, that is, X > ji. Then 

^I77T5 - "," A and since is a positive integer, — T < -i- c 

360(An) a 360( l m)"» 36o({JIl) 3 36q ^ 

Further, jg^— j^Jf S 0, since Xn < n. Using these, we have 

ire * ( A - s& - <jfc - rifc> * ° 

This proves the upper bound (2). The lower bound is found similarly by 
underestimating the numerator and overestimatin